Making Structs Fun and Interesting
Steven L
Posted on July 18, 2022
Organizing data into easy-to-use structures is one of the biggest underpinnings of programming. An organized, named structure of data helps us to write cleaner and more efficient code that is shareable with others.
However, I also think structs are ugly, and require too much mental overhead. Remembering the way to initialize each and every struct is a pain, and most of the time we would prefer to use smart constructors to execute the logic we would want anyway.
Or we might choose to use self-mutating methods to build up a struct over time, which is the result of method chaining. This is a much better method in my eyes because you don't have to fill in all the details of a struct at the struct's initiation point.
As a primary Racket user, I want to share structs in other languages and stack it up against Racket, and share the rabbit hole of struct initialization and how to improve it.
Structs in Rust
A struct in Rust is no different than that of C or C++ or even D, really. A struct is a list of fields containing some data, and we apply names that we wish to bind the fields to such that it's easy to access them later on. This is basically attaching named fields to an array of data, but this array of data doesn't have to be the same type.
struct MyData {
pub a: i16,
pub b: u32,
pub c: f32,
}
To use this, we have to define a variable binding and create the struct and initialize all it's fields.
let my_point = MyData { a: 5, b: 77, c: 3.5 };
But if the struct had a lot more fields, this would start to run on quite a bit.
let my_long_struct = MyLongExample {
a: 0, b: 1, c: 2,
d: 3, e: 4, f: 5,
g: 6, // ...
};
It would be at this point you would want to consider a smart constructor method that would fill in fields that you don't need immediately, like say you only wanted one or two fields to be filled while the rest get defaulted to their zero-type states (numbers are zeroed, Booleans are false, strings are empty, etc).
A smart constructor has a number of uses, but filling in un-needed values is one, but another can be satisfying an invariant rule in a larger data structure. Sometimes on struct initialization you need to execute some logic, so it's best off resting in a constructor method that is easy to use for all.
impl MyData {
pub fn new(a: i16, b: u32) -> MyData {
return MyData { a: a, b: b, c: 0.0, };
}
}
While it's an improvement, long constructor method calls can still become a headache, and after some time, you may decide it's better to create a more method-chain-oriented way of creating structures. Instead of stuffing all your data into one function call, you can instead create several methods which pack and store the data for you, so it's less of an eye sore.
impl MyData {
pub fn new(a: i16, b: u32) -> MyData { ... }
pub fn empty() -> MyData {
return MyData{ a: 0, b: 0, c: 0 };
}
pub fn set_a(mut self: MyData, a: i16) -> MyData {
self.a = a;
return self;
}
pub fn set_b(mut self: MyData, b: u32) -> MyData {
self.b = b;
return self;
}
}
Which can be used in the following way at the initialization site:
let mystruct = MyData::empty()
.set_a(5)
.set_b(7);
Now it's more clear what you're setting, and the logic can be further restrained or better defined within these setter methods. That way, users cannot break any rules with your structs later on, especially if your values have to be within certain ranges.
The limitation with using self
variables inside a struct implementation is that the self
value must always de-reference to the struct type itself, and thus it is cumbersome to add additional abstract typing like with Option<T>
or Result<T,E>
to the implementation. Even if you did, it's tricky to unpack all the layers of pattern matching that this could incorporate without it being inside another Result<T, E>
function to encapsulate the ?
operator.
I am a big fan of method chaining for struct initialization, as it gives a good compromise between satisfying the invariants of the program and making it less clumsy to build structs from your cool source code library.
Now I will look at Racket, and I will show you why I think it's clumsy in it's current given state.
Structs in Racket
Racket is a Lisp/Scheme family language that enables you to do so much more than most other languages. It can be functional, it can be object-oriented, it can be used to design GUI applications, games, or even backend servers to serve web pages and other content. Racket, in my eyes, is a very cool language and I love talking about it.
However, one thing that has bothered me for a long time is how boring the structs are. Structs are sort of like that of C/C++, D, Zig or Rust, but not really. Since everything in Racket is a type of linked list, a struct is a way of defining a group of functions to create that linked list, and automatically create the functions needed to traverse said list.
(struct Point (x y z) #:transparent)
(define p1 (Point 3 4 5))
(displayln p1)
; shows #(struct:Point 3 4 5)
The magic behind the struct
macro is that it generates a bunch of boilerplate methods alongside your layout.
Looking at the above example, the values that get generated for us are:
-
Point
, the constructor -
Point-x
, thex
value getting function -
Point-y
, they
access -
Point-z
, thez
access -
Point?
, the predicate true/false function
Since a struct
's underlying data is a linked list (or heck, it might not be a linked list but instead could be a hash map), creating the getter methods to get the fields is a trivial macro to implement. Same as the Point?
function, all trivial macro implementations.
My issues with this generic struct
macro implementation is that it suffers from the same thing as other struct
language issues: once the fields become too many, initializing structs becomes more tedious than anything, as you have to remember the exact order of the values you want, making it rather difficult. Even more so difficult in a Lisp-like language.
(struct VeryLong (a b c d e f g ...))
(define my-data
(VeryLong 1 2 3 4 5 6 7 ...))
In this example, because it's not required to have the field names when creating a struct, it gets a little hard to memorize exactly what each field is doing. Even if you named all the fields, when you create the struct, the field names aren't required, making it more unclear what you're really building. You would then have to look back at the source code to figure out what the fields you're filling out are.
Then we might consider building a smarter constructor to help us further constrain the program invariants, which is a smart thing to do. You can even use named fields with the smart constructor so you can provide default values and mix in things as you want.
(define (Builder #:value1 [v1 0]
#:value2 [v2 "hi"]
#:value3 [v3 '(1 2 3 4 5)])
(SomeStruct v1 v2 v3))
(define mydata (Builder #:value1 500 #:value3 '(4 5))
; #(struct:SomeStruct 500 "hi" (4 5))
This is a smart solution for defining a better interface for your library, but the code can feel a bit like too much, as the list of keyword arguments can run on quite long the further your struct goes.
Let's look at the method-chaining path from Rust used earlier - say we were given some struct, and we wanted to copy it so that one field would be changed. There exists a macro to do this called struct-copy
, which will copy a struct, and given a pair list of IDs and new values, it will modify those fields with the supplied values.
(struct Point (x y z) #:transparent)
(define p1 (Point 3 4 5))
(define p2
(struct-copy Point p1 [x 7] [z 9]))
(displayln p2)
; #(struct:Point 7 4 9)
struct-copy
is a macro that generates code to create a new struct from the old one. If we were to do this without struct-copy
, it would look a little... odd, I guess.
(define p1 (Point 3 4 5))
; copy the original values without struct-copy
(define p2
(Point (Point-x p1)
(Point-y p1)
(Point-z p1))
; use the old struct, but supply new values
(define p3
(Point (Point-x p2) 8 (Point-z p2)))
struct-copy
is a handy macro for this situation because copying struct information is a boring chore to do.
To enable some kind of method-chaining, we have to write some function that will copy the old struct state and copy it into a new one via struct-copy
. This can be written as a plain function.
(define (change-x old-state new-value)
(struct-copy Point old-state [x new-value]))
Great, now we can use this.
(define p1 (Point 3 4 5))
(define p2 (change-x p1 700))
; #(struct:Point 700 4 5)
... Frankly I'm unsure if this changed anything. If we continued to write more methods to satisfy the other fields:
(define (change-y old-state new-value)
(struct-copy Point old-state [y new-value]))
(define (change-z old-state new-value)
(struct-copy Point old-state [z new-value]))
It's still not super useful because data has to be stored somewhere in-between each call. You have to define a new binding each time you want to use these functions, or you have to chain them all together in one very large composition.
(define p3
(change-z
(change-x
(change-y p2 700)
800)
900))
; #(struct:Point 800 700 900)
What a mess. It would be better if we could somehow compose this into a cleaner way of modifying old structs. Granted, if we reversed the positions of the new value and the old struct, it would look different like:
(define p3
(change-z 900
(change-x 800
(change-y 700 p2))))
Still, maybe there's a better way.
Folding Functions over State
Our means of composing struct-copy
functions isn't very pleasant in it's current form, but maybe there's a way of changing that. In this section I will introduce a way of folding functions over state.
Miniature glossary going forward:
- a Function is a piece of code that accepts a value and returns some output
- State is some value representing the state of a system, ie a number or a collection of numbers/values
- Folding is the idea of "folding" a function over a list of values, by reducing them into a final value (like folding paper until there's no way to fold it any further)
What we have here is an example of functions and state: our structs are our state, and we want to create some functions that can copy the old state and return some kind of new, updated state.
Given a function f
and some state s
, we can "run" a function by giving the state to the function. We can define this as the following:
(define (run-state f s)
(f s))
It almost seems redundant, but it will become important in a bit.
The next part is we need to re-write our custom struct modifying methods. Instead of writing a function that accepts some state and a new value, let's change it to curried form where it accepts a value, then yields a function that then accepts some state, then applies struct-copy
.
(define (change-x new-value)
(lambda (old-state)
(struct-copy Point old-state [x new-value])))
What this gives us is not a function that directly manipulates state, but instead gives us a function that will later manipulate state. It's a little different, but I think it's cool since it takes advantage of currying.
(define set-x-500 (change-x 500))
(define p1 (Point 3 4 5))
(define p2 (set-x-500 p1))
;#(struct:Point 500 4 5)
We can do the same for the other fields as well.
(define (change-y new-value)
(lambda (old-state)
(struct-copy Point old-state [y new-value])))
(define (change-z new-value)
(lambda (old-state)
(struct-copy Point old-state [z new-value])))
Now we're getting somewhere. We have an indirect method for creating setter functions on a given struct. Now we need a way to glue this all together. In it's current form, it would still look like this:
(define p1 (Point 3 4 5))
(define p2
((change-x 500)
((change-y 300) p1)))
Little weird, but we can re-write this using a folding function.
Just for an example, a fold is a way of reducing a list of values by supplying some initial value as the start, and we have a "reducer" function repeatedly apply to the values as it traverses a list.
(foldl + 0 '(1 2 3 4 5))
; 15
; same as
; (+ 5 (+ 4 (+ 3 (+ 2 (+ 1 0)))))
It starts with a call that looks like (+ 0 1)
, which is the first value of the list with the initial value given to the fold function (0
in this case). The way it works is that it uses a two-argument function to take in some value and apply a function with some "accumulated" value.
This function is what I will call the "reducer", as it takes in all these values and tries to reach some final value after a series of calls.
(define (reducer value acc)
(+ value acc))
(foldl reducer 0 '(1 2 3 4 5))
; 15
Let's re-think this a bit: does the list of values have to be some list of atomic values like numbers? No! It can be anything, because Racket is a dynamic language that treats all values as just ordinary citizens. This means that list of values can actually be a list of functions, too!
; from earlier
(define (run-state f s)
(f s))
; use the `add1` function 3 times
(foldl run-state 0 (list add1 add1 add1))
; 3
Since we changed our struct modifying functions into curried form, they return a function that changes some state, meaning we can actually re-write all of our struct-copy
wrapper functions into a foldl
function call.
(foldl run-state (Point 0 0 0)
(list
(change-x 500)
(change-y 600)
(change-z 700)))
; #(struct:Point 500 600 700)
That almost looks like method-chaining from Rust! To simply leave some fields as zero, you can take them out from the given list entirely, or you can put as many of them as you want. I wouldn't say this is ground-breaking stuff, but it certainly opens the doors for much cleaner and precise code that your users (and you) might fall in love with.
One thing that's lacking here is a better way of creating empty structs, which we did in the Rust section, so I'll fill that in now.
(define (init-point)
(Point 0 0 0))
(foldl run-state (init-point)
(list
(change-x 500)
(change-y 600)
(change-z 700)))
Now you don't have to initialize a struct yourself and can just use the plain initializer to set it to some default struct.
At this point, we have created a unique way of defining our structs, and this can be helpful in creating libraries and constraining the data in the ways you need. But there's one last thing that I want to go over real quickly: creating a macro.
Macro'ing it All Together
A macro is a way of generating code in Racket, and most other languages. While regular functions operate on data, a macro operates on the actual code you write. A macro returns a list of code that is to then be evaluated by the Racket VM.
Macros simplify programming, and languages that support writing macros are fascinating because of all the ways you can factor out things you don't like writing, simplifying things and creating your own miniature domain-specific language for a problem.
I am not going to make it more complex, but right now, if you wanted to define a struct using our new functions, it's not exactly pleasant to have to manually write out the foldl
call yourself, nor would it be fun to make your users do that each time.
Instead, it might be user to write a function that can do it for you.
(define (make-point . funs)
(foldl run-state (init-point) funs))
(define p1
(make-point
(change-x 500)
(change-y 600)))
; #(struct:Point 500 600 0)
That's better, and thanks to the way Racket functions can handle multiple values, the function list is binded to the funs
argument, which is supplied to foldl
easily.
Even still, looking at this it's still fairly exhausting. My criticism is that it's still too verbose. You know you want to create a point, so why not just get straight to the point? This is where a macro will kick in and help us to reduce the code.
To give us a brief primer on macros, let's just write a simple macro that will double a value by quite literally "doubling" it. To define macros we use define-syntax-rule
.
(define-syntax-rule (double x)
'(x x))
; (double 5)
; '(5 5)
; (double "hello")
; '("hello" "hello")
; (double it)
; '(it it)
Racket takes this rule and performs a language transformation by substitution. Whatever the value of x
is in this macro is substituted with whatever we gave to the macro. It doesn't do variable expansion at this point, because it's just performing a literal translation. When we supply it
, it doesn't look for any kind of variable binding for it
, it turns it into a symbol due to the literal translation.
When writing macros, all you do is return plain Racket code. The code won't be evaluated until later, so we can use it to automatically generate code for us.
Let's look at a macro that handles many values and treats it as a list. This will involve a special name binding known as the ellipsis, or ...
, which gives us a way of binding many values to one binding in the macro space. We can translate code into a begin
sequence by simply attaching the list of code to the original begin
macro.
(define-syntax-rule (my-begin code ...)
(begin code ...))
(my-begin
(displayln 1)
(displayln 2))
; 1
; 2
Not very complex, but it acts as a way of translating code from one form to another. At this level, the define-syntax-rule
macro is one of the simpler ways of writing macros, and it only lets us get so far, but it's good enough for right now.
My goal is to create a macro that will do the above (define (make-point ...)
code for us, so we don't have to always do it that way. It seems redundant, so maybe let's write a macro to make it a bit easier.
(define-syntax-rule (define-point name code ...)
(define name
(foldl run-state (init-point) (list code ...))))
So instead of doing:
(define p1
(make-point
(change-x 500)
(change-y 600)))
We can now do:
(define-point p1
(change-x 500)
(change-y 600))
By taking advantage of the ellipsis, we can translate a list of syntax into a list of code to be used by the foldl
function, and supplying some of the basic values to it like run-state
and init-point
, it acts as a nice and clean way of creating our Point struct.
It saves a few lines of code, but the name of the macro makes it clear you are in fact trying to define a new Point struct, and supplying the variable name gives it to the original define
macro, so really it's a nice way of skipping a few steps and making it easier to do whatever your library is trying to do.
Conclusion
Structs are great, but the more complex your struct gets, the harder it is to maintain the code surrounding it. Even adding one field to the struct can mean tons of changes to all the original call-sites.
Smart constructors and method-chaining are good ideas for offsetting the amount of tedious labor you would have to do to provide the correct changes. On top of that, method-chaining is a nice and interactive way of building up your data.
For Racket, these approaches aren't baked in, but it doesn't mean impossible. The magic of macros makes it easy to fill in the blanks and create a more optimized experience for yourself and your users. I have been employing this folding-function method for a little bit now and thought to share it with all.
Thanks for reading!
(Header image credit @jannerboy62 on Unsplash)
Posted on July 18, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024