Tagged Unions and ReScript Variants
Patrick Ecker
Posted on May 15, 2021
In JavaScript there are many situations where we want to express certain shapes of an object based on the conditions of its attributes, e.g.
// Plain JS - Typical Redux Action types
if(action.type === "addUser") {
const user = action.user;
createUser(user);
}
if(action.type === "removeUser") {
const userId = action.userId;
removeUser(userId);
}
You can find this pattern in many other scenarios, such as representing the method of a request (req.method === "POST"
-> req.body != null
), representing UI state (userReq.isLoading
-> userReq.name == undefined
), or even error state (result.err != null
-> result.msg != undefined
). The shape of the object is different, depending on the state of attributes defined by a specific ruleset.
In TypeScript, we'd use a so called Discriminated Union Type (Tagged Unions) to be able to encode the conditional object shape within the type itself. For our previous example, we would define a type for a user action
like this:
// TypeScript
type AddUser = {
type: "addUser",
user: User
};
type RemoveUser = {
type: "removeUser",
userId: string
};
type UserAction = AddUser | RemoveUser;
As a ReScript developer, you probably had troubles writing FFI (interop) code to represent such Tagged Unions. How are we able to handle these data structures without changing the JS representation?
Usually we'd define a variant for representing different shapes of data, but unfortunately variants do not compile to the same shape of user defined Tagged Unions.
This article demonstrates in a practical example how we'd map data structures for RichText data (designed as a Tagged Union) to ReScript variants.
Important: We'll only discuss mapping ReScript variants to immutable JS values, since mutations to the original values will eventually not be reflected in the variants at runtime. Handling mutable data requires a different strategy which is not covered in this post.
Background on the Use-Case
This post is based on a a real-world use-case where I needed to represent Storyblok CMS' RichText data structures within ReScript but couldn't find any proper documentation on how to do this.
I tried to keep the data model simple to only capture the basic concepts. For a more thorough side-by-side implementation of a TS / ReScript Storyblok RichText model, including rendering logic, you can check this repository later on.
Design RichText Data with TypeScript
To kick things off, we'll define some basic RichText elements we want to be able to represent: Text
, Paragraph
and Doc
. These will be defined as a Tagged Union called RichText
:
interface Text {
type: "text";
text: string;
}
interface Paragraph {
type: "paragraph";
content: RichText[];
}
interface Doc {
type: "doc";
content: RichText[];
}
export type RichText =
| Doc
| Text
| Paragraph;
Each case of the RichText
type listed above has one common attribute type
, which helps the type system to differentiate the shape of a given value by checking value.type
, e.g. via an if
or switch
statement. Let's see that in action:
// Recursively iterate through the RichText tree and print all Text.text contents
function printTexts(input: RichText) {
switch(input.type) {
case "doc":
case "paragraph":
return input.content.forEach(printTexts);
case "text": {
console.log(input.text);
break;
}
};
}
const input: RichText = {
type: "doc",
content: [
{
type: "paragraph",
content: [{type: "text", "text": "text 1"}]
},
{
type: "paragraph",
content: [{type: "text", "text": "text 2"}]
}
]
};
printTexts(input);
TypeScript will be able to infer the relevant data for each case correctly most of the time.
There's a few things I personally dislike in TS when handling Tagged Unions (especially via switch
statements):
-
switch
statements are not expressions (can't return a value without wrapping a function around it) - cases need extra braces to prevent variable hoisting and need a break / return statement to prevent case fall-through
- Without any return statements or other trickery, TS apparently does not do any exhaustive checks within switches
- Discriminated union types are really noisy in type space code and I often had a hard time navigating / writing types, even in smaller codebases
- switch statements can only match one value at once. More complex discriminants / multiple discriminants are impractical
- object types are structurally typed and TS will not always automatically infer the type correctly without type annotation (as seen in the
const input
declaration above). Error messages are generally harder to read because of that.
... but these are all just opinions.
In the next step, let's discover how we'd represent that data model in ReScript.
Representing Tagged Unions in ReScript
We now have an existing RichText representation, and we want to write ReScript FFI (interop) code to represent the same data without changing the JS parts.
ReScript's type system can't express Tagged Unions in the same way as TypeScript does, so let's take a step back:
The core idea of Tagged Unions is to express a "A or B or C" relation and to access different data, depending on what branch we are currently handling. This is exactly what ReScript Variants are made for.
So let's design the previous example with the help of variants. We will start defining our type model within our RichText.res
module:
// RichText.res
module Text = {
type t = {text: string};
};
type t;
type case =
| Doc(array<t>)
| Text(Text.t)
| Paragraph(array<t>)
| Unknown(t);
As you can see, there's no much going on here. Let's go through it really quick:
- We defined a submodule
Text
, with atype t
representing a Text RichText element. We refer to this type viaText.t
. -
type t;
is representing our actual Tagged UnionRichText
element. It doesn't have any concrete shape, which makes it an "abstract type". We'll also call this typeRichText.t
later on. - Lastly we defined our
case
variant, describing all the different cases as defined by the Tagged Union in TS. Note how we also added anUnknown(t)
case, to be able to represent malformed / unknown RichText elements as well
With these types we can fully represent our data model, but we still need to classify incoming JS data to our specific cases. Just for a quick reminder: The RichText.t
type internally represents a JS object with following shape:
{
type: string,
content?: ..., // exists if type = "doc" | "paragraph"
text?: ..., // exists if type = "text"
}
Let's add some more functionality to reflect on that logic.
Classifying RichText.t data
We will extend our RichText.res
module with the following functions:
// RichText.res
module Text = {
type t = {text: string};
};
type t;
type case =
| Doc(array<t>)
| Text(Text.t)
| Paragraph(array<t>)
| Unknown(t);
let getType: t => string = %raw(`
function(value) {
if(typeof value === "object" && value.type != null) {
return value.type;
}
return "unknown";
}`)
let getContent: t => array<t> = %raw(`
function(value) {
if(typeof value === "object" && value.content != null)
{
return value.content;
}
return [];
}`)
let classify = (v: t): case =>
switch v->getType {
| "doc" => Doc(v->getContent)
| "text" => Text(v->Obj.magic)
| "paragraph" => Paragraph(v->getContent)
| "unknown"
| _ => Unknown(v)
};
The code above shows everything we need to handle incoming RichText.t
values.
Since we are internally handling a JS object and needed access to the type
and content
attributes, we defined two unsafe raw functions getType
and getContent
. Both functions receive a RichText.t
value to extract the appropriate attribute (while making sure our data is correctly shaped, otherwise we will end up with an Unknown
value).
Now with those two functions in place, we are able to define the classify
function to refine our RichText.t
into case
values. It first retrieves the type
of the input v
and returns the appropriate variant constructor (with the correct payload). Since this code uses raw
functions and relies on Obj.magic
, it is considered to be unsafe code. For this particular scenario, the unsafe code is at least isolated in the RichText
module (make sure to write tests!).
Note: You might have noticed that we store the content
part of a "doc"
object directly in the Doc(array<t>)
variant constructor. Since we know that our Doc model does not contain any other information, we went ahead and made our model simpler instead.
Using the RichText module
Now with the implementation in place, let's showcase how we'd iterate over RichText
data and print every Text
content within all paragraphs:
// MyApp.res
// We simulate some JS object coming into our system
// ready to be parsed
let input: RichText.t = %raw(`
{
type: "doc",
content: [
{
type: "paragraph",
content: [{type: "text", "text": "text 1"}]
},
{
type: "paragraph",
content: [{type: "text", "text": "text 2"}]
}
]
}`)
// keyword rec means that this function is recursive
let rec printTexts = (input: RichText.t) => {
switch (RichText.classify(input)) {
| Doc(content)
| Paragraph(content) => Belt.Array.forEach(content, printTexts)
| Text({text}) => Js.log(text)
| Unknown(value) => Js.log2("Unknown value found: ", value)
};
};
printTexts(input);
As you can see in the printTexts
function above, we call the function RichText.classify
on the input parameter, for the Doc | Paragraph
branch we can safely unify the content
payload (which both are of type array<RichText.t>
) and recursively call the printTexts
function again. In case of a Text
element, we can deeply access the record attribute RichText.Text.text
, and for every other Unknown
case, we directly log the value
of type RichText.t
, which is the original JS object (Js.log
is able to log any value, no matter which type).
In contrast to the TS switch
statement, let's talk about the control flow structures here (namely the ReScript switch
statement):
- A
switch
is an expression. The last statement of each branch is the return value. You can even assign it to a binding (let myValue = switch("test") {...}
) - Each branch must return the same type (forces simpler designs)
The most important part is, that we have the full power of Pattern Matching, which can be performed on any ReScript data structure (numbers, records, variants, tuples,...). Here is just one small example:
switch (RichText.classify(input)) {
| Doc([]) => Js.log("This document is empty")
| Doc(content) => Belt.Array.forEach(content, printTexts)
| Text({text: "text 1"}) => Js.log("We ignore 'text 1'")
| Text({text}) => Js.log("Text we accept: " ++ text)
| _ => () /* "Do nothing" */
};
-
Doc([])
: "Match on all Doc elements with 0 elements in its content -
Doc(content)
: "For every other content (> 0) do the following..." -
Text({text: "text 1"})
: "Match on all Text elements where element.text = 'text 1'" -
Text({text})
: "For every other Text element with a different text do the following ..." -
_ => ()
: "For everything else_
do nothing()
"
Extending the RichText data model
Whenever we want to extend our data model, we just add a new variant constructor to our case
variant, and add a new pattern match within our classify
function. E.g.
type case =
| Doc(array<t>)
| Text(Text.t)
| Paragraph(array<t>)
| BulletList(array<t>) // <-- add a constructor here!
| Unknown(t);
let classify = (v: t): case =>
switch (v->getType) {
| "doc" => Doc(v->getContent)
| "text" => Text(v->Obj.magic)
| "paragraph" => Paragraph(v->getContent)
| "bullet_list" => BulletList(v->getContent) // <-- add a case here!
| "unknown"
| _ => Unknown(v)
};
It's that easy.
Note on Runtime Overhead
It's worth noting that our RichText
module approach introduces following overhead:
- Variants with payloads are represented as arrays, so every classify will create a new array with the variant content inside (also the extra
classify
call. - Our
getContent
andgetType
function does extra checks on the structure of each input value.
Please note that the ReScript Compiler team is currently investigating in a better runtime representation for variants to be able to map more seamlessly to JS and improve performance in the future.
Note on Recursion
I am aware that the examples used in this article are not stack-safe. This means that you can practically blow your call stack when there are enough deep recursive calls. There's ways to optimize the examples to be stack-safe, just be aware that I tried to keep it simple.
Conclusion
We started out by defining a very simple version of (Storyblok based) RichText data structures in TypeScript and highlighted some aspects of Discriminated Unions / Tagged Unions.
Later on we created FFI code wrapping variants around the same RichText data structures. We created a RichText.res
module, defined a data model with a cases
variant and a classify
function to be able to parse incoming data. We used pattern matching to access the data in a very ergonomic way.
We only scratched the surface here. I hope this article gave you an idea on how to design your own ReScript modules to tackle similar problems!
In case you are interested in more ReScript related topics, make sure to follow me on twitter.
Special thanks to hesxenon and cristianoc for the extensive technical reviews and discussions!
Further Reading
Posted on May 15, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.