Nested list flattening
Pete Tsamouris
Posted on January 1, 2020
Disclaimer
As per the previous post in this series, in the process of helping out the community of people alphatesting
the Unison
programming language, I am solving the 99 Haskell problems
in unison
, and making notes, tailored for a beginner-friendly audience.
One of the points of the 99 exercises
is to try alternative implementations: I will go out of my way (within reason) and explore such alternatives, even if a faster, more obvious, or easier approach is readily available.
Understanding of some concepts of functional programming might be required, and the intention of this series is to be tailored to a beginner-level audience. For all concepts mentioned in the post, I have made the effort to embed relevant hyperlinks, so in the event you encounter something you don't fully understand, feel free to either read to the end then search, or why not get a quick glimpse before you proceed.
Exercise 7: Flatten a nested list structure
Can I play too?
To follow along you will need:
- the Unison codebase manager
ucm
. Optionally in the$PATH
. - A new project created with
ucm -codebase path init
. -
ucm
running on your targetcodebase
:ucm -codebase path
. - The
standard library
(calledbase
), cloned (from insideucm
):pull https://github.com/unisonweb/base .base
. - An editor of your choice to open
scratch.u
, under thecodebase
location:ucm
will pick up any changes to thescratch file
, and provide output as necessary.
Wait, nested
lists are a thing?
Unison does support user defined types
, and these will be the focus of this post.
Consider the following definition, where the nested structure is abstracted behind a single type
signature:
type Nested a = Element a | List [Nested a]
Instances of the Nested
type, can contain:
- a single element of type
a
, or - a list of elements of this type, as noted by
[Nested a]
Element
and List
are the two constructors
of the algebraic data type
Nested
.
And it's as easy as that?
Let's ask ucm
:
⍟ These new definitions are ok to `add`:
type Nested a
Now evaluating any watch expressions (lines starting with `>`)... Ctrl+C cancels.
To give the semblance of uniformity with the rest of the Unison
data types, I will cd base
before adding
the type to the codebase. You can be reminded of the constructors' signatures by using ls
:
.> ls .base.Nested
1. Element (a -> Nested a)
2. List ([Nested a] -> Nested a)
So what about tests?
We begin by adding the signature of the flatten
method, with a default implementation, to be able to write unit tests.
The nested list
does encode structure in it, so I will try and cover as many types of structure in my tests:
flatten: Nested a -> [a]
flatten any = []
test> tests.flatten.empty =
run( expect( flatten ( List []) == []))
test> tests.flatten.elem =
run( expect( flatten ( Element 1) == [1]))
test> tests.flatten.elems =
run( expect( flatten ( List [Element 1, Element 2]) == [1, 2]))
test> tests.flatten.nestOne =
run( expect( flatten ( List [ List[Element 1]]) == [1]))
test> tests.flatten.nestedMany =
run
(expect
( flatten
(List
[ Element 1,
List [List [Element 2, List [Element 3]]],
Element 4 ])
== [1, 2, 3, 4]))
Here are the scenarios tested:
- Empty
List
- Single
Element
-
List
containing multipleElement
values -
List
inside aList
, containing anElement
-
Element
followed byList
, followed byList
followed byElement
The above could have been broken down to simpler, more granular test cases, but I find that this is quickly becoming tedious, and does not really offer much additional value to you, the reader. By all means, feel free to experiment with more granular test cases. The last one overlaps in coverage with the third and fourth.
ucm
agrees with the types of the above, and runs the tests, all of which, except the first one, are of course failing (omitted for brevity).
⍟ These new definitions are ok to `add`:
flatten : Nested a -> [a]
tests.flatten.elem : [Result]
tests.flatten.elems : [Result]
tests.flatten.empty : [Result]
tests.flatten.nestOne : [Result]
tests.flatten.nestedMany : [Result]
Can we now fill in the implementation of flatten
?
Let's by all means. Here's one way to go about it:
flatten: Nested a -> [a]
flatten e =
case e of
Element el -> [el]
List (l +: ls) -> flatten l ++ flatten (List ls)
List [] -> []
If flatten
encounters an Element
, which only wraps a single el
, all it needs to do is wrap it in a list. If it encounters a List
of any structure of nested values, it breaks it in head +: tail
, and separately recurses
into head
and tail
, concatenating the two calculated results with (++)
. This approach uses simple recursion
.
⍟ These new definitions will replace existing ones of the same name and are ok to `update`:
flatten : Nested a -> [a]
⍟ I've updated to these definitions:
flatten : .Nested a -> [a]
After the update
, tests
are passing!
◉ tests.flatten.empty : Passed 1 tests.
◉ tests.flatten.elem : Passed 1 tests.
◉ tests.flatten.nestOne : Passed 1 tests.
◉ tests.flatten.elems : Passed 1 tests.
◉ tests.flatten.nestedMany : Passed 1 tests.
Can we generalise
the recursion?
As mentioned in the first post of this series, since the original flatten
is using simple recursion, it can be expressed in terms of foldb
.
flatten: Nested a -> [a]
flatten e =
case e of
Element el -> [el]
List l -> foldb (flatten) (++) [] l
As a reminder, here is the foldb
signature (find
/view
), along with the type signatures, as well as orchestration to implement flatten
, layed out one by one.
.> view base.List.foldb
base.List.foldb : (a -> b) -> (b -> b -> b) -> b -> [a] -> b
base.List.foldb f op z as
(Nested a -> [a]) -> flatten
([a] -> [a] -> [a]) -> (++)
[a] -> []
[Nested a] -> l
foldb
starts (and ends prematurely if the input is empty), with []
. Whenever it encounters a component of [Nested a]
(beginning with the head
of l
), it runs the f
function, in this case flatten
. It then concatenates (++)
the output of op a
with the previously accumulated result.
After an update
, the tests
are still passing.
Can the recursion/foldb
be turned into an iteration
?
By relying on a small helper function, the last implementatoin can be expressed in differently formulated manner that avoids foldb
.
First, consider the following signature, where f
can turn a value of a
to a number of values of [b]
:
f: a -> [b]
f
can be passed to a function that, for the sake of uniformity with Haskell
, will be called concatMap
. concatMap
then iterates
over the [a]
and converts each instance to a [b]
.
I will at this point make a quick detour, and add tests for concatMap
: I rely on it to implement flatten
, so for the sake of thoroughness, it should be functioning as expected:
test> tests.concatMap.empty =
run (expect (concatMap (a -> [a]) [] == []))
tests.concatMap.idOne =
run (expect (concatMap (a -> [a]) [1] == [1]))
tests.concatMap.doubleOne =
run (expect (concatMap (a -> [a, a]) [1] == [1, 1]))
These tests pass
, and prove that concatMap
indeed only applies f
to each a
in the list, without performing any additional transformations. Feel free to be more thorough, but in the confines of Unison
, the type signatures can be trusted.
Going back to the task at hand, for our purposes f is flatten
, and a -> [b] is Nested a -> [a]
concatMap: (a -> [b]) -> [a] -> [b]
concatMap f as = foldl (acc a -> acc ++ f a) [] as
flatten: Nested a -> [a]
flatten e =
case e of
Element el -> [el]
List ls -> concatMap flatten ls
As before, if flatten
encounters an Element
, the a -> [a]
conversion is a "wrap in a list" operation. When it encounters a [Nested a]
, flatten
does not recurse
into itself by calling itself. Instead it calls concatMap
that uses flatten
as the function to work with. Internally, concatMap
runs foldl
.
⍟ These new definitions will replace existing ones of the same name and are ok to `update`:
flatten : Nested a -> [a]
.> update
⍟ I've updated to these definitions:
flatten : Nested a -> [a]
With tests passing, that's a wrap!
Posted on January 1, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.