The Strange Art of Code Reuse, Part 3

armousness

Sean Williams

Posted on July 5, 2022

The Strange Art of Code Reuse, Part 3

The first two parts of this series talked about dynamic typing, then object-oriented programming. Along the way, I tried to get across some notions of what polymorphism is. When you have a function take an argument of some type, there are two issues: first, what operations can be done with that type, and second, what are the semantics of that type?

The semantics of primitive types have been ingrained in us since elementary school. Integers are for counting, rational numbers are for ratios, booleans are for logic, and strings are for reading and writing. Once you start structuring data, you need to come up with your own semantics: vec3 is a position in a three-dimensional vector space, array is an ordered list of things, Person is information about a person in the real world.

I dislike dynamic typing because it has sloppy semantics, and you can even see that at the primitive level. JavaScript coerces all numbers to double, which degrades its use for counting. This also means that you don't know off hand whether a number in JavaScript is for counting or for ratios.

Object-oriented programming encodes semantics, particularly for structured data, into a sort of data taxonomy. If your data don't rigidly conform to your taxonomy, then you either have to refactor, or you have to hack. Interfaces ease the restrictiveness of class hierarchies, but in my experience this fact isn't appreciated as much as it should be. Even then, interfaces have the problem that you can't generally tack them onto classes you bring in from the outside (i.e., from an API).

This brings us to Haskell, the ivory tower of programming languages. It's not the tallest tower out there, but that doesn't make it any less ivory.

eat :: ∀t, t ∈ Edible: t → IO ()

For lack of a better term, I'll be referring to the Haskell-style type system as category programming. At first blush, category programming is focused on interfaces, though they're called typeclasses. The caveat is that interfaces can be implemented on existing types, including primitive types. For example, the Haskell analog of ToString is a typeclass called Show, defined as follows:

class Show t where
    show :: t -> String
Enter fullscreen mode Exit fullscreen mode

While the syntax may be unfamiliar, you probably already know everything going on here. t is a generic type parameter, much like std::vector<'t> in C++ or List<'t> in C#. Like interfaces, the typeclass Show defines one or more function "templates," so all we specify here is a signature. :: is a type annotation, and -> indicates a function.

So Show is a typeclass containing one function, show, which takes a value of type t and returns a String. We can then add types to a typeclass with instance. For example,

-- Define a Vec2 type, which just contains two doubles
-- Don't stress the syntax right now
data Vec2 = Vec2 Double Double

instance Show Vec2 where
    show (Vec2 a b) = "(" ++ show a ++ ", " ++ show b ++ ")"
Enter fullscreen mode Exit fullscreen mode

The basic Haskell library already implements Show on many types, including Double—much like how Dotnet defines ToString on double. Let's dive into the original example problem, of making a generic function to add elements to a collection:

{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
import Data.Set as Set

class AddToCollection c e where
   add_to_collection :: e -> c e -> c e

instance AddToCollection [] t where
    add_to_collection e c = e : c

instance Ord t => AddToCollection Set t where
    add_to_collection e c = Set.insert e c
Enter fullscreen mode Exit fullscreen mode

Okay so there's a lot going on here. Let's agree to ignore FlexibleInstances and MultiParamTypeClasses and focus on the meat. First and foremost, we had no difficulty putting basic API types into this class, which is great. But what's going on with these type signatures?

Let's start from a more basic point, which is "currying." In ML-style languages (including Haskell and F#), you don't surround arguments to function calls in parentheses. Instead, functions have "free variables," and applying a value to a function "closes" one of them. So Set.insert is a function with two free variables, t -> Set t -> Set t. We apply a value e to it, which closes it to Set t -> Set t, then we apply a value c to it, which closes it all the way to just Set t.

This same logic applies to types. Dotnet Generic.List is a type function, while Generic.List<int> is a type. Generic.List has a free type parameter; passing it the type int gives us a concrete type. Dotnet requires you to surround type arguments in angle-brackets, while Haskell allows type arguments to be curried.

So let's look at

class AddToCollection c e where
   add_to_collection :: e -> c e -> c e
Enter fullscreen mode Exit fullscreen mode

again. c is a type function, and e is a concrete type. The add_to_collection function takes an element (of type e), and a collection (of type c e). At this point, this isn't very different from a C# function taking arguments of type int and List<int>. The difference is, to my knowledge, C# doesn't let you define interfaces with type functions as generics.

And what about this:

instance Ord t => AddToCollection Set t where
    add_to_collection e c = Set.insert e c
Enter fullscreen mode Exit fullscreen mode

The default Data.Set is tree-based, and operations on sorted trees require the elements of the tree to be comparable, with > and <. Since Haskell was made by mathematicians, the class of types that define less-than and greater-than is called ordinal, meaning, they can be ordered. Ord t is a type constraint: we say that t must be ordinal.

This also lets us take a glimpse at the formality behind all of this. The reason we need the Ord t constraint is because of the type of Set.insert:

insert :: Ord t => t -> Set t -> Set t
Enter fullscreen mode Exit fullscreen mode

The add_to_collection function, as defined for Set t, applies the Set.insert function to e and c. Set.insert requires that its first argument be of an ordinal type, and that its second argument be a Set of ordinal types. This requirement gets re-imposed on our function, add_to_collection.

Now, what exactly is t? Let's again compare it to something more familiar: what is an integer, or maybe, what is a 32-bit integer? The answer to that is, it's a whole number between -2,147,483,648 and 2,147,483,647. So int32 isn't a particular value, but it's a range of values. Better said, int32 is a set, and a value of type int32 is a member of that set.

t is similar: t is the set of all possible types. Ord t => narrows this down, where t is any type in the set Ord. Remember, Ord is a typeclass, and types can be added to a typeclass with instance. Many types are in there by default (including Int and Char), and we can add any other types we want.

Earlier we defined a type of two-dimensional vectors. As it stands, this doesn't work:

data Vec2 = Vec2 Double Double

vec2set = add_to_collection (Vec2 5 2) Set.empty
-- error: No instance for (Ord Vec2) arising from a use of 'add_to_collection'
Enter fullscreen mode Exit fullscreen mode

Okay, so we need Vec2 to provide an instance of Ord...

instance Ord Vec2 where
    compare (Vec2 a1 b1) (Vec2 a2 b2) =
        if a1 == a2 then compare b1 b2 else compare a1 a2
-- error: No instance for (Eq Vec2) arising from the superclasses of an instance declaration
Enter fullscreen mode Exit fullscreen mode

It turns out, classes can require other classes, and Ord requires Eq, which defines equality. Okay, so...

instance Eq Vec2 where
    (Vec2 a1 b1) == (Vec2 a2 b2) = a1 == a2 && b1 == b2
Enter fullscreen mode Exit fullscreen mode

And now we're good, we can now have a Set of Vec2 since we've satisfied all the requirements (including the earlier definition of Show, which is required to pass something to print):

main :: IO ()
main = print (add_to_collection (Vec2 5 2) Set.empty)
-- output: fromList [(5.0, 2.0)]
Enter fullscreen mode Exit fullscreen mode

This probably shows you why Haskell has the reputation it does: To explain object-oriented programming, I just gestured at the idea of taxonomy and your intuition did the rest of the work. To really explain category programming, we had to get knee-deep in theory. The plus side is, this is a system of such staggering flexibility that, as long as you keep your distance from the Entscheidungsproblem, you can do nearly anything you want.

I'm going to close out on one cool feature that you kind of need this level of complexity to tackle. I mentioned way back at the beginning that dictionaries are weird. If you have a dictionary that contains an entry "a": 5 and you add "a": 3, what's the correct thing to do? As usual this depends on your dictionary's semantics, but often the best thing to do is to set "a" to 8.

But let's think big: what I'm really talking about is having insertion semantics that are specific to the type of the dictionary's values. The dictionary type in Haskell is called Map, and as in Dotnet's Generic.Dictionary, it has two type parameters. Because of this, let's sidestep an even more obscure issue called "higher-kinded polymorphism" and define a new typeclass:

class AddToMap k v where
    add_to_map :: k -> v -> Map k v -> Map k v
Enter fullscreen mode Exit fullscreen mode

Map defines a function insertWith: we pass it a function, and if a key is already present in the Map, it combines the old and new values using that function; otherwise it inserts the value. We can use it to create instances of AddToMap that are specialized to particular value types:

instance Ord k => AddToMap k String where
    add_to_map k v c = Map.insertWith (++) k v c

instance Ord k => AddToMap k Int where
    add_to_map k v c = Map.insertWith (+) k v c
Enter fullscreen mode Exit fullscreen mode

++ is string/list concatenation, while + has the usual meaning. So if we have a Map of Strings, overwriting an existing key instead concatenates, while if we have a Map of Ints, overwriting an existing key instead adds:

main :: IO ()
main = do
    print (add_to_map "a" (5::Int) (add_to_map "a" 3 Map.empty))
    print (add_to_map "message" "hello" (add_to_map "message" " world" Map.empty))

-- output:
-- fromList [("a",8)]
-- fromList [("message","hello world")]
Enter fullscreen mode Exit fullscreen mode

Conclusion

There's no question that category polymorphism is the most powerful of the type systems considered here. This doesn't mean I think everyone should use Haskell, or that you should try to convince your company to switch to Haskell, or anything like that. The point of this series was to be informative, not persuasive. Indeed, in practice, there are enormous drawbacks to using Haskell, but understanding Haskell can teach you a lot about computer science.

That is the one thing I'll definitively say about all this: I don't care whether you use Haskell, but you should absolutely learn it. I mean, I hardly ever use it, but the knowledge I gained from learning it is indispensable.

This is also the core reason that Rust is my current favorite language: Rust's traits are very similar to Haskell's typeclasses, but I think Rust has better handling of other stuff, particularly mutable state. Also Rust is incredibly fast.

Now all it needs is a good GUI library...

In any case, if you made it through all this, I hope you gained a deeper appreciation for polymorphism, and the three-way trade-off between code reusability, semantic stability, and language complexity.

💖 💪 🙅 🚩
armousness
Sean Williams

Posted on July 5, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Command Pattern
design Command Pattern

November 28, 2024

Cloud computing: Evolution and use case
cloudcomputing Cloud computing: Evolution and use case

November 29, 2024

Bubble Sort in C
c Bubble Sort in C

November 28, 2024

Estruturas de Dados: Lista
datastructures Estruturas de Dados: Lista

November 27, 2024