Code Monk
Posted on February 25, 2024
While working on a recursive file-watcher I came upon an elegant way to express trees in Go. This approach works well in cases where each sibling node must have a unique value and that value cannot be the zero value (as in a filesystem). In practice, this is most trees.
A tree is a data structure that contains data in a hierarchy whereby one parent has as many children as it wants, and those children can have children, and so on.
Let us imagine a folder named "Primates", containing sub-folders, which themselves contain more sub-folders and/or text files:
.
└── Primates
└── Apes
└── Great Apes
├── African Apes
│ └── Gorillas
│ ├── Eastern Gorilla.txt
│ └── Western Gorilla.txt
├── Chimpanzees
│ ├── Bonobo
│ └── Chimpanzee
├── Homo Sapien.txt
└── Orangutans
├── Bornean Orangutan.txt
└── Sumatran Orangutan.txt
In terms of behaviour, strictly speaking, a tree needs only two methods (although more are often provided for convenience and optimisation).
type Node[T comparable] interface {
Data() T
Children []Node[T]
}
Where T
is the data we care about. In the example above, T
is string
, and that string represents the folder or file name. So we might begin to build our tree of life like so:
lifeTree := createTree[string]()
primates := lifeTree.AddChild("Primates")
apes := primates.AddChild("Apes")
// etc...
But how do we power this data structure, in terms of a concrete type? The canonical way is to use a struct.
type node[T comparable] struct {
data T
children []*node[T]
}
This is perfectly fine. But there is a more ergonomic way, made possible because of two important characteristics of maps in Go:
- a map can be defined recursively
- the zero-value for a map is
nil
// recusively defined node
type node[T comparable] map[T]node[T]
Here, we're defining a map type whose keys are the values we care about, and whose values are... other maps whose keys are values we care about.
Remember that Node
is the only object we need to define, because a tree can be said to be it's root node.
Keys of maps need not be simple strings or ints, but they need to satisfy the comparable type constraint.
Leaf nodes (in our example, text files) are nil maps. They terminate recursion. A recusive data structure is not useful without a base case. We do not wish to swallow the universe.
So how might we implement the Node[T]
interface using the node[T]
concrete type? Let's look at a basic interface, and then dive in.
type Node[T comparable] interface {
Data() T
Children map[T]Node[T]
Set(T)
Get(T) Node[T]
AddChild(T) Node[T]
RemoveChild(T)
}
Children()
This is the simplest. A node is precisely defined as it's children, so:
func (n *node[T]) Children() map[T]Node[T] {
return n
}
Set(T)
This will be used to set leaf nodes. To set non-leaf nodes (branches) we'll be using AddChild()
func (n *node[T]) Set(key T) {
n[key] = nil
}
Remember, key
is meaningful here. it's not a lookup key. It's the data we care about.
AddChild()
Here's where we get to define our hierachy and enable actual tree behaviour
func (n *node[T]) AddChild(key T) Node[T] {
branch := new(node[T])
n[key] = branch
}
Data()
This is a little bit trickier. Since a Node is entirely defined as the map that represents it's children, where does it's Data()
, the data we care about, come from?
The answer is that the Data()
of a node is the key of parent that contains that node. This makes perfect sense when you think about a filesystem. A folder name only makes sense in the context of where it sits in the hierarchy. The uniqueness constraint of it's name comes from it's parent. To change a folder name is to change nothing about the folder itself and it's behaviour, but it does change how it's parent addresses it.
Here is where you begin to see that it would be useful (though strictly speaking not necessary) for a node to have a link back to it's parent. To use our folder example, we would like to have our special ".."
file in our directory listing. It makes traversing the filesystem way easier, and certain operations way more efficient.
But how can this be done using a map? According to the Go Proverbs, we should Make the Zero Value Useful. Remember that we do not allow our users to use the zero-value (for example, a file or folder name cannot be blank). We will use that special value for ourselves.
So then, a constructor function might now look like this:
func New[T comparable](parent Node[T]) Node[T] {
var zeroK K // auto initialized to zero value
branch := new(node[T])
// tell the child where the parent is
branch[zeroK] = parent
return branch
}
// the root node has no parent
myExampleTree := New[string](nil)
myBranch := myExampleTree.AddChild("some branch")
... requiring us to augment certain methods:
func (n *node[T]) AddChild(key T) Node[T]) {
// tell the parent where the child is
n[key] = New[T](n)
}
In my view, this is an elegant approach to building trees satisfying the two aformentioned constraints (must be comparable, can't be zero). I wrote a more full-featured implementation of this that includes the following method set for common efficient tree operations.
type Node[K comparable] interface {
Data() K
Children() map[K]Node[K]
Spawn(K) Node[K]
RemoveChild(K)
Parent() Node[K]
Walk() [][]K
Ancestry() []K
IsTerminal() bool
Set(K)
Get(K) (Node[K], bool)
SetParent(Node[K])
fmt.Stringer
}
Use it: https://pkg.go.dev/github.com/sean9999/go-ergonomic-tree#section-readme
Fork it: https://github.com/sean9999/go-ergonomic-tree/tree/v0.0.1
Posted on February 25, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.