Why TypeScript classes are a poor option for representing data
Eric Haynes
Posted on October 27, 2022
-
Why TypeScript classes are a poor option for representing data
- Example Model
- JSON format
- Data structure is as important as data content
- Problems with classes
- They are incompatible with declarative syntax
- They break type safety!
- They promote mutation
- Decorators
to the rescuecause even more problems - You can't count on decorators anyway
- What are we actually definining?
- Conflict of interests
- Type proliferation
- Summary
- Up Next
Example Model
For the purpose of this article, we’ll be referring to this relatively simple chunk of data representing a User:
{
"firstName": "Joe",
"lastName": "Schmoe",
"age": 35,
"address": {
"streetAddress": "123 Some St",
"city": "Nowhere",
"state": "NY",
"postalCode": "12345"
},
"contact": [
{
"type": "phone",
"value": "212-555-1234"
},
{
"type": "email",
"value": "joe.schmoe@example.com"
}
]
}
Before jumping in to specifics about problems with classes, let’s do a quick examination of Typescript’s ability to represent data.
JSON format
Json surpassed XML in the mid-2000s as the format of choice for serialized data. Presumably, most of you are familiar with the format, but it's worth stopping to consider just how simple it is. The vast majority of all of the content on the entire Internet consists of just three data types and 2 collection types (!!!) :
-
string
- an ordered sequence of characters -
number
- numeric values (all floating point) -
boolean
- true or false -
array
- and ordered list of any of the types -
object
- a key/value store where keys are strings and values are any of the types
Additionally, there is null
. Its specific semantics aren't defined, but in general, it serves as a placeholder to represent a known property that does not have a value in some context.
Note that JSON stands for “JavaScript Object Notation”. You can paste ANY valid json directly into an assignment in a JS or TS file and it will be valid! This is really important because, frankly, most other languages are abysmal at representing data in this format. When using JS/TS, one might be inclined to reach for patterns familiar from other languages/frameworks, but I think this is a mistake. Let's explore!
Data structure is as important as data content
Json rose to prominence because of readability. Without any mental overhead at all, you can look at the above and know everything about the User. It’s all in one place, and its layers of nesting serve as an organizational strategy.
Now, a simple exercise. At a glance, what is this image?
Can you tell? See the answer below:
The purpose of this is to demonstrate how classes arrange structured data. Since classes are flat, each layer of nesting requires a new class. Thus, we’re slicing up the data horizontally and then representing it in our code vertically. Furthermore, we have to define the lowest parts first and work our way back out the “real” object.
A class structure for the above might look like:
class Address {
streetAddress: string;
city: string;
state: string;
postalCode: string;
}
class Contact {
type: string;
value: string;
}
class User {
firstName: string;
lastName: string;
age: number;
address: Address;
contact: Contact[];
}
All of the organization of the structure has been stripped away. We had to “promote” all of the elements into first class entities. We want to represent a User
, but Address
and Contact
are now its equals. It’s not even immediately obvious which one is the "real" entity. You have to read through all of them, mentally keeping track of who depends on whom. Most people can hold between 5 to 9 “things” in their short term memory. For this simple structure, it’s doable, but it's not far from the limit of cognitive "juggling". Eventually, you'll get used to the structures and they'll become familiar through rote memorization, but you can't simply read it and understand the shape. It's not supposed to be a mental exercise just to figure out what a payload looks like!
To play devil’s advocate, if this were preferable, it would be serialized like this fictitious structure:
{
"address.streetAddress": "123 Some St",
"address.city": "Nowhere",
"address.state": "NY",
"address.postalCode": "12345"
"contact[0].value": "212-555-1234"
"contact[0].type": "phone",
"contact[1].value": "joe.schmoe@example.com"
"contact[1].type": "email",
"firstName": "Joe",
"lastName": "Schmoe",
"age": 35,
}
Now let’s consider how to represent this in idiomatic TypeScript:
type User = {
firstName: string
lastName: string
age: number
address: {
streetAddress: string
city: string
state: string
postalCode: string
}
contact: {
type: string
value: string
}[]
}
The static type declaration is fully declarative, and EXACTLY matches its serialized counterpart. I'll leave the quotes here to demonstrate how serialized JSON literally IS valid TypeScript:
const user: User = {
"firstName": "Joe",
"lastName": "Schmoe",
"age": 35,
"address": {
"streetAddress": "123 Some St",
"city": "Nowhere",
"state": "NY",
"postalCode": "12345"
},
"contact": [
{
"type": "phone",
"value": "212-555-1234"
},
{
"type": "email",
"value": "joe.schmoe@example.com"
}
]
}
This feature of Typescript is HUGE, but many overlook the significance. You may not realize the amount of mental overhead you invest in hopping between representations of the exact same data, but it adds up.
Most of the statically-typed languages used today are insufficiently expressive to do this. It is literally impossible to define structured data in a way that resembles that structure in Java, C#, Ruby, Rust, C, or C++. Scala, Go, and Kotlin can define nested structure types, but they’re all very clumsy with it, and to my knowledge, none of them allow creating an instance of one without imperative code mixed in (comments welcome).
Here’s our image above in a declarative format:
Problems with classes as data models
Now, let’s examine some of the problems with classes as data models.
They are incompatible with declarative syntax
If this term is ambiguous, it's easiest to define in contrast with imperative syntax:
- imperative code describes the steps that the runtime must take to inialize an object
- declarative code describes the state of the initialized object
How we instantiated const user: User = {
above is declarative; we're declaring the object in its full state. Imperative code, on the other hand, declares the object, then describes all of the steps to build it:
const user = new User()
user.address = new Address()
user.address.streetAddress = '123 Some St'
user.address.city = 'Nowhere'
user.address.state = 'NY'
user.address.postalCode = '12345'
user.contact = []
const contact0 = new Contact()
contact0.value = '212-555-1234'
contact0.type = 'phone'
user.contact.push(contact0)
const contact1 = new Contact()
contact1.value = 'joe.schmoe@example.com'
contact1.type = 'email'
user.contact.push(contact1)
user.firstName = 'Joe'
user.lastName = 'Schmoe'
user.age = 35
If you spend even a short time focusing on writing code in a declarative style, the benefits to code clarity are immediately apparent. In many languages, imperative style is your only option, and to be blunt, this is an unreadable mess even with our simple data type.
Beyond that, we have no mechanism to ensure that we've actually filled in everything, as we already lied to the compiler and told it that all the values were present. It sees this as simply REassignment of a value, rather than an initialization. Thus, this becomes not just a matter of readability, but also one of functional correctness.
And this brings me to the most important drawback...
They break type safety!
You might be wondering, “how can classes break type safety when they’re typed?” Consider a class like:
class Example {
name: string
}
This class is not valid TypeScript because it's literally impossible to create a valid instance of it. The type definition claims that, given an instance of Example
, its name
property is a string
. But you can't construct one where that's true:
const instance = new Example()
// BROKEN!!!
console.log(instance.name.toUpperCase())
A strictly enforced null
type is far more fundamental than you might realize coming from a background in... well, most other languages. TypeScript won't allow you to declare something to be non-null (or non-undefined) without ensuring that it is, in fact, non-nullish. If you've been running around with half of the TS compiler turned off (strict features disabled), it might not be obvious. The TS compiler option strictPropertyInitialization
being disabled ignores these infractions, but they are NOT valid TypeScript.
class Example {
// Property 'name' has no initializer and is not
// definitely assigned in the constructor. ts(2564)
name: string
}
With the full language enabled, you can ensure that no such broken classes can be declared by requiring that all non-optional properties are assigned to defined values at the time of construction:
class Example {
name: string
constructor(name: string) {
this.name = name
}
}
const instance = new Example('Joe')
// SUCCESS!!!
console.log(instance.name.toUpperCase())
Now, this is ok for a small handful of members. But data models may have dozens or even hundreds of fields. To be a valid class, you must have a constructor that accepts all of the fields of the object. We can have hundreds of constructor parameters and pray that we get them in the right order, or we can use "named parameters" a.k.a. a "parameters object". In the case of data objects, however, the parameters object would be an object that ALREADY FULLY ADHERES TO THE TYPE. To demonstrate the ridiculousness of this redundancy, there is literally no better way to declare such a constructor than to have an argument whose type is the class itself!
class Model {
first: string
second: string
third: string
fourth: number
constructor(values: Model) {
this.first = values.first
this.second = values.second
this.third = values.third
this.fourth = values.fourth
}
}
const params: Model = {
first: 'one',
second: 'two',
third: 'three',
fourth: 4,
}
const instance = new Model(params)
They promote mutation
The spread operator added in ES6 is one of the most powerful features ever added to any language I have used. It exposes and promotes a clear, consice way of performing a copy modify, which means first class support for immutable practices. I generally prefer avoiding dogma, but one that I adhere to is this: Try to never mutate a value, and absolutely never mutate a value unless you're in full control of the object. You're only in full control of a variable if it's a local variable, and even then only as long as you haven't passed it anywhere else. If that's the case, mutate to your heart's content. But if you need to pass an object around to initialize it, consider instead retrieveing the smaller objects and combining them:
const createUser = (id: string): User => ({
...getNameAndAge(id),
address: getAddress(id),
contact: getContactInfo(id),
})
Now you never have an invalid instance of the object.
Classes, however, completely ruin this marvelous feature. As an example, imagine Joe has a birthday, so we want to update Joe's age. In declarative format:
const updated: User = {
...user,
age: 36,
}
It's a clean and concise modification that is fully comprehensible without stepping through the interactions. You can simply read it as, "It's the current user, but with a different age". If we wanted to perform the same type of copy modify with classes, well, pile on more imperative code:
const updated = new User()
updated.firstName = user.firstName
updated.lastName = user.lastName
updated.address = user.address
updated.contact = user.contact
updated.age = 36
If anything new is added to User
, you once again have no type safety here. The spread above is fully typesafe. If you add a pets
section to User
, nothing needs to change; you have compile time safety that no parameters are omitted, as well as type safety of the modified properties. In our latter imperative example (which requires disabling part of the compiler), you need to hunt for EVERY place that does that.
const updated = new User()
update.pets = user.pets
// ...
As a result of this pain, the pressure is rather strong to simply do:
user.age = 36
This pushes the change back up the stack to anyone still holding this reference in a prior part of the code. Doing so makes all of that code non-deterministic. You can perform the same operation on the same value twice, and have no idea what you'll end up with, because any other part of the system could have pulled the rug out from underneath you.
MORE THAN ANY OTHER REASON, THIS IS WHY WE ALL FEAR THE TERM "LEGACY". Software systems built around shared mutable state grow in complexity exponentially. The number of places that might have set a value is
numProperties ** numPlacesYouHavePassedIt
You can't really even test test a unit. It might work in a unit test without the other parts, but if other parts modify it, then your test is useless. It becomes impossible to even identify all of the code paths. Conversely, code using immutable practices grows in complexity linearly. This piece always takes this value in and sends that value out. Tested, done, and on to the next part.
Furthermore, immutable practices are actually a much better mirror of the manner in which software systems actually work; when a webservice receives a request
, it can't simply mutate the body to give feedback to the caller. It can only use it to build an appropriate response
. The lines are clear. "I can't modify this, because I didn't create it." If you treat function parameters the same way, your entire system becomes trivial to refactor into different services. Only when they rely on a shared set of state parameters do they become difficult to separate.
Decorators to the rescue cause even more problems
There seems to be a growing interest in decorators. While the implementation differs a bit, the applications to which they're applied are largely synonymous with Java Annotations. Essentially, they allow attaching metadata to classes and their fields and methods. At a glance, this seems like a nice way to abstract away some of the uglier details of dealing with data. Validation, conversion & translation, mapping to database entities, etc. are some of the more common ones. However, it's important to consider the origins of such patterns.
In Java, you're dealing with a language that has absolutely no typesafe dynamic programming capabilities. Let this sink in. Given an object and the name of one of its properties, it's impossible to retrieve that value in a manner that's statically type safe. You can check it at runtime and throw an error, or you can write code that just happens to never break it, but the compiler is blissfully unaware of it... (really, the Java compiler is blissfully unaware of anything unless you explicitly tell it, but I digress). In TypeScript, all of the native dynamic functionality is fully type-checked.
const obj = { hello: 'world' }
// Type 'string' is not assignable to type 'number'. ts(2322)
const value: number = obj['hello']
// Type 'number' is not assignable to type 'string'. ts(2322)
obj.hello = 123
vs.
class Obj {
public String hello;
}
// ...
Obj obj = new Obj();
obj.hello = "world";
Field field = Obj.class.getField("hello");
field.setAccessible(true);
// sure hope it's an int... the compiler doesn't know
int hello = field.getInt(obj);
Well, decorators are reintroducing all of this same untyped nonsense with reflect-metadata
. Reflection is an absolutely horrible way to implement dynamic programming. You turn compile-time problems into runtime issues. There's really no excuse for this. There is no plausible reason for "advancing" this "feature" other than, "it's what we did in Java!" Why have static type checking when you can just add a comment to it?
The decorated class must implement the
ExceptionFilter
interface.
import { Catch } from '@nestjs/common';
@Catch(String, null, Catch, () => 'monkeys like bananas')
export class ExceptionFilter extends Date {}
Some of these could be solved, but it's ridiculously complicated, and the decorator champions sure as hell aren't doing it... and some of them are just plain impossible. E.g.
@Injectable()
export class SomeService {
constructor(@Inject(APPLE) apple: Orange) {}
}
You can't count on decorators anyway
It's important to note that Javascript doesn't really have classes. It's all syntactic bullshit sugar over object creation. There is no significant runtime distinction between an instance of a "class" and some other object that happens to have to same fields. This is perfectly fine... until you start defining behavior in a "magic bucket of untyped metadata", which is what decorators do. Let's explore:
Say we have a decorator called Max
that, on setting a numeric value, throws an error if the assigned value is greater than the parameter. The details of its implementation are out of scope for this article, but let's take for granted that it enforces the rule. Using it would look like:
class Material {
name: string
// decorator prevents setting the value to a number greater than 10
@Max(10)
quantity: number
}
// ...
material.quanity = 20 // ERROR!
The decorator is statically defined in the class definition, so we know for certain that no instance of Material
will have a quantity
value greater than 10... right? So we attempt to rely on this behavior...
const NUCLEAR_REACTOR_MELTDOWN_THRESHOLD = 15
// We feel safe & secure...
const addNuclearMaterial = (material: Material) => {
if (material.name === 'uranium' && reactor.uraniumCount <= 5) {
try {
reactor.addUranium(material.quantity)
} catch (error) {
log.info(
'Too much uranium! Everyone would die!',
)
}
}
}
addNuclearMaterial({ name: 'uranium', quantity: 20 })
// ensuing loud explosion
There is no way for the compiler to differentiate between an object created with new Material
and a regular object of the correct shape. In fact, it's not even enforced in the "class" itself!
class Material {
constructor(name: string, quantity: number) {
return { name, quantity }
}
name: string
@Max(10)
quantity: number
}
Thus, you can't count on any of the behavior defined in decorators without checking that the object was, in fact created as an instance of the decorated class. In short, you can't truly count on any of the behavior defined by decorators in any other part of the codebase.
Furthermore, static attachment on the type is generally not the correct place for this anyhow. 10 might be an appropriate max for one use case, but not another:
const addCoolant = (material: Material) => {
reactor.addCoolant(material)
}
const material = new Material()
material.name = 'water'
material.quantity = 50
// we needed that much coolant, so... ensuing loud explosion
addCoolant(material)
What are we actually definining?
Even in cases other than validation where we really are only describing metadata, the static type definition is still a bad place for it. A deep dive into functional patterns is out of scope for this article, but one of the key differences between OO (or what passes for it these days) and functional programming is that the latter makes a distinction between data and behavior. In OO, you define “things”, and often a thing both “is stuff” and “does stuff”. Functional patterns differentiate data, which “is stuff”, and functions, which “do stuff”. I think the latter is a far better way to design software, because that’s how computer systems actually work. Files, http requests/responses, databases, etc. deal with “dumb” data. Programs, webservices, functions describe the behavior of what you want to DO with the various forms of data sent their way.
The point of this digression is that decorators change a model from being a plain definition into a mix & match of data and contextual behavior. If you add decorators for database details, you don’t just have a User, you have a UserInSomeSpecificDatabase
. If you add decorators for api docs, serialization, & validation, you have a UserInSomeSpecificWebservice
. Sometimes, you can get away with just one for both (UserInSomeSpecificDatabaseInSomeSpecificWebservice
), but sooner or later, some of those things are going to start to conflict.
Conflict of interests
With functional patterns, each function can define some specific operation to do WITH some data:
declare function readUserFromRequest(request: Request): User
declare function readUserFromDatabase(db: DbClient): User
declare function readUserFromTheStars(telescopeData: StarMapping): User
The types are merely a definition of either inputs or outputs. With decorators, you end up in one of 2 scenarios:
Best case, it’s cluttered as hell. Here is a realistic example of a db entity that is also exposed via a webservice endpoint:
class Address {
@Column('street_address')
@ApiProperty()
@IsString()
@IsNotEmpty()
streetAddress: string
@Column()
@ApiProperty()
@IsString()
@IsNotEmpty()
city: string
@Column()
@ApiProperty()
@IsString()
@IsNotEmpty()
state: string
@Column('postal_code')
@ApiProperty()
@IsString()
@IsNotEmpty()
postalCode: string
}
class Contact {
@Column()
@ApiProperty()
@IsString()
@IsNotEmpty()
type: string
@Column()
@ApiProperty()
@IsString()
@IsNotEmpty()
value: string
}
class User {
@Column('first_name')
@ApiProperty()
@IsString()
@IsNotEmpty()
firstName: string
@Column('first_name') // oops, wrong column
@ApiProperty()
@IsString()
@IsNotEmpty()
lastName: string
@Column()
@ApiProperty()
@IsNumber()
@Min(0)
age: number
@OneToOne(() => Address)
@JoinColumn()
@ApiProperty()
@ValidateNested()
@Type(() => Address)
address: Address
@OneToOne(() => Contact)
@JoinColumn()
@ApiProperty()
@IsArray()
@ValidateNested()
@Type(() => Contact)
contact: Contact[]
}
It’s a stroke of luck that they don’t conflict (or worse, maybe they do, but the conflicts elude your tests). Which brings us to our other option: the decorators DO conflict with each other, leading to…
Type proliferation
When decorators conflict, your only option is to break the class into separate, contextual versions. You end up with a JsonUser
and a DbUser
and a UserWhoOwnsADonkey
. While not an industry accepted term, I refer to this as "type proliferation".
When you have multiple types representing the same object, these have to be kept in sync (except, of course, for their subtle differences). These often elude the compiler completely. As awesome as Typescript’s type aliases and interfaces are for allowing a representation of object literals, they're not a great mechanism to keep inheriting classes synchronized because of optional parameters. There is no way to define an interface that says, "implementations must have this optional parameter". Consider:
export interface User {
firstName: string;
lastName: string;
age?: number;
address: Address;
contact: Contact[];
}
export class UserClass {
firstName: string;
lastName: string;
// age?: number; <-- no warning or error!!!!
address: Address;
contact: Contact[] = [];
}
The entire point of static typing is that you can make a canonical representation of something, but this context problem makes that impossible. Canonical representations get far too little credit, but when you have core business models, there is immense value in consistency. No matter what part of a distributed system you're in, a consistent definition means you can count on the names and value types being the same. Frontend, backend, doesn't matter. Even in contexts where we have shed the type definition, if the data was created using a common type, you can simply and safely assume consistency even in different languages, request bodies, queue payloads, file dumps, nosql tables, and many others. Transformations of the data because of some second-order consequence of the representation leads to inconsistencies, and those lead to bugs.
In short, if your types are purely type definitions, it's easy to have them in a common place. If they're classes with contextual decorators, however, they end up strewn all over the place. One library that defines a User
won't want to include the decorators that go with the other. This makes it impossible to share a canonical representation.
Summary
In short, using classes for data modeling ruins many of the features that make TypeScript a great language, and offer absolutely no substantive benefits to counter the limitations. They make your code less readable, harder to follow, and less flexible. I posit that the only reason anyone would gravitate towards this pattern is based on familiarity with other languages where it's the only option. In porting solutions designed for those languages, we're also porting in the limitations of those languages they were designed to solve. Similarly, the fact that classes are only a crude approximation of the construct of the same name in class-based languages mean that those patterns aren't even as reliable as they were in the languages that bred them.
Up Next
In our next episode, we'll cover why classes are bad for... everything else!
Posted on October 27, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.