elm

De-throning the List: Part SC4K

robinheghan

Robin Heggelund Hansen

Posted on April 23, 2018

De-throning the List: Part SC4K

In the last post we looked at how we could make Array more expressive. By which I mean we added a way for people to extend the data structure to support more use cases. But while third parties now can add missing functionality, there are certain functions which should be supported out of the box, both for performance and convenience reasons.

Elm has four built-in collection types: List, Array, Dict and Set. With the exception of Dict, these types look similar from a birds eye view. You can insert an item into the collection, get it back out, and even perform some task for every item stored. Of course, they do have different trade-offs which make them suitable for different situations.

The List is great when adding and retrieving items at the beginning (left side) of the collection. Array is great when you need fast access to any element, and Set is good when you need a sorted collection with no duplicates.

The API for these collections are different as well. Sometimes this makes sense, but not always. For instance, there is no sort function for Set, which is reasonable as Set is a sorted collection. However, there also isn't a sort function for Array, which means that you have to convert it to a List or a Set and then back again if you want it sorted.

Converting from one collection type to another is a costly operation. If you use an Array because it suits your use case most of the time, you don't want to convert to another collection just to perform that one operation.

Let's look at how we can improve the API of List, Array and Set to reduce the need for converting from one collection type to another to support the functionality you need.

Differences which should remain

Certain operations aren't supported on a collection type with good reason. For instance, there is not get or set operation on List because the performance would be bad. If one need to get or set a random item in a collection, Array should be used. Likewise, union, intersect and diff should only be supported for Set. Here are some functions that are not supported by all three collection types, which I believe shouldn't be supported either:

get, set, insert, remove, :: and push can only be supported efficiently by their respective collections. If you're in need of one of these functions, it's a tell-tale sign of which collection you should use. However, with the improvements I've mentioned in earlier posts, Array could support all of these functions efficiently, which is yet another reason as to why Array should be the default collection type.

indexedMap and toIndexedList make no sense for Set, as it doesn't operate by indexes and doesn't retain insertion order. repeat doesn't make much sense for Set either, as it cannot contain duplicates.

union, intersect and diff can only be efficiently implemented by Set.

head can be supported efficiently by the other collections, but the operation itself really only makes sense for List. Array would use get 0, and it isn't quite clear what head means in the case of Sets, is it the first/lowest/minimal element or the root/middle element?

sort, sortBy and sortWith doesn't make sense for Set, but Array should definitely get these functions.

reverse can be implemented efficiently for Array, but makes little sense for Set.

Where we could bridge the gap

Below is a list of functions which can be easily supported across all three collections:

  • singleton
  • initialize
  • range
  • concat
  • intersperse
  • member
  • map2, map3, map4, map5
  • concatMap
  • filterMap
  • partition
  • take
  • drop
  • unzip
  • slice
  • sum
  • product
  • maximum
  • minimum
  • all
  • any
  • scanl

While these functions could be supported across all three collections, does that mean they should? Since we're already messing about, let's ask ourselves if there are certain functions that shouldn't be in core to begin with.

sum and product are essentially shortcuts for List.foldl (+) 0 and List.foldl (*) 1. Are those operations common enough to grant them dedicated functions in the core library? If we introduce slice for all three collections, do we need take and drop?

Do people actually use map4 and map5? How about scanl and unzip? I mean, there probably are, but are those operations so common that one would expect to find them in core? Not that it necessarily does any harm, but there is something to be said of having a small, but powerful, package where you can read the entire documentation in a short sitting.

Anyway, here's a list of which functions I, personally, see no reason for having in core. I expect people will tell me that these functions definitely belong, but I think it would be nice to at least have the discussion.

  • intersperse
  • map4
  • map5
  • concatMap
  • unzip
  • sum
  • product
  • scanl
  • take
  • drop

Finishing touches

Here are some minor things I think shouldn't be left out of a API renovation:

List and Array has a length function, Set has size. This is likely due to the fact that Set is a thin abstraction on top of Dict and so has just copied the name. I would rename Set.size to Set.length out of consistency if nothing else.

Now that I'm on the topic, since Set is thin abstraction on top of Dict, maybe everything discussed up til this point applies to Dict as well?

List has :: and tail. The analogous for Array is push and pop. pop is missing from the API, so I would add it.

I would add find for all three collections. List and Set already supports member, which is essentially find except that it returns a boolean instead of the value for which the predicate function returns true. It's a more useful function in general, and so if member should be a part of the API (which I believe it should) then find belongs as well.

For Set and Dict I would throw in min and max to retrieve the minimum/first and maximum/last element.

I think Stack is a better name than List and so would rename that type. It's a name that more clearly explains what that collection type is good at.

Finally, as mentioned in the previous post, I think stoppableFoldl and stoppableFoldr, or something with the same functionality but with better names, should be added to the API of Array. I think Dict and Set could be well served with that functionality as well.

What do you think?

The goal I'm trying to reach is to make the Array type more general purpose, but when improving the Array API, it makes sense to look at the other collection types as well. This article lays out what I believe those APIs should look like.

First, I want to say that I'm not authority on what makes it into Elm. If I actually code up the suggestions in this article and submit it as a proposal, all of it could make it in, or none of it.

Second, all these suggestions are based on my own experience. What I need is your experience. Does these suggestions make sense, or am I off my rockers? Let me know in the discussion thread on the Elm Discourse.

And now, for something quite different

So, is this the end of this series? Not quite. There's a summary left the be published, but that will have to wait until after we go over the top (you didn't think I'd forgot, did you?) in the next post, which is about the literal representation of List and Array.

💖 💪 🙅 🚩
robinheghan
Robin Heggelund Hansen

Posted on April 23, 2018

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related