"When was the last time you used this?" - Part 1: Data Structures
Pawel Kadluczka
Posted on March 22, 2024
A candidate recently asked me, "When was the last time you used this data structure, if ever?"
The candidate admitted that as someone who worked on company internal tools, they hadn't needed to use more advanced data structures in years. They were genuinely curious about how often I dealt with problems where these data structures were useful.
Their question provoked me to review data structures I used at work, learned for interviews, or used to solve programming puzzles and think about how often I used them. I share my list below.
Caveat: Every software developer deals with different problems. I created the list based on my experience. If I haven't used a data structure, it doesn't mean that it is not used or not useful. Instead, it likely means I could solve my problems without it.
Dictionary
Dictionary is one of the most commonly used data structures. It can be applied to a wide range of problems. I use dictionaries daily.
Typical implementations of Dictionary use a hash table (with a linked list to handle collisions) or a balanced Binary Search Tree. Understanding the underlying data structure gives immediate insights into the cost of basic operations like insertion or lookup.
Nowadays, every modern programming language offers an implementation of Dictionary.
Set
Set is another data structure that I use very frequently. It is surprising how often we need to handle duplicates efficiently. Set shares a lot with Dictionary. These similarities make sense because Set could be considered a Dictionary without the value.
Linked list
I implemented linked lists several times at the beginning of my career over twenty years ago, but I haven't needed to do this since. Many standard libraries include implementations of linked lists, but again, I don't remember the last time I needed them.
Linked list-related questions used to be a staple of coding interviews, but fortunately, they are less popular these days.
Knowing how linked lists work could still be valuable because they are sometimes used to implement other data structures, such as stacks or queues.
Stack
While I rarely need to use stack directly, this data structure is extremely common.
Every program uses a stack to track invoked functions, parameter passing, and local data storage (the call stack).
Stack is a foundation for many algorithms, such as backtracking, tree traversal, and recursive algorithms. It is often used to evaluate arithmetic expressions and for syntax parsing. JVM (Java Virtual Machine) or CLR (Common Language Runtime) are implemented as stack machines.
Even though this happened long ago, I vividly remember reviewing a diff in which a recursive tree traversal was converted to the iterative version with explicit stack to avoid stack overflow errors for extremely deep trees.
Queue
Task execution management is one of the most common applications for queues: iOS uses the DispatchQueue
, WebServers queue incoming requests, and drinks at Starbucks are prepared on the FIFO (First-In, First-Out) principle.
I also use queues most frequently for task execution. My second most frequent use is solving Advent of Code puzzles with BFS (Breadth First Search), which uses a queue to store nodes to visit.
An interesting implementation fact about queues is that they often use a circular buffer under the hood for performance reasons. Implementations using linked lists are usually slower due to allocating and deallocating each node individually.
Heap / Priority Queue
I don't remember when I had to use the Heap data structure to solve a problem at work. I don't feel too bad about this (except when I forgot about Heap during an interview). Microsoft added the PriorityQueue
type only in .NET 6 - about 20 years after they shipped the first version of .NET Framework. Apparently, they, too, didn't consider Heap critical.
Although I didn't need to use Heap directly, I am sure some libraries I integrate my code use it. Heap is crucial to efficiently implementing many algorithms (e.g., Dijkstra, Kruskal's Minimum Spanning Trees).
Trees
It is challenging to talk about trees because they come in many shapes and colors. There are binary trees, n-ary trees, Binary Search Trees (BST), B-trees, Quadtrees, Octrees, and Segment Trees, to name just a few.
I have worked with (mostly n-ary) trees in every job. HTML and XML Document Object Models (DOM), C# Expression Trees, Abstract Syntax Trees, and domain-specific hierarchical data all require an understanding of the Tree data structure.
I have never had to implement a BST at work, but balanced BSTs are one way to implement (sorted) Dictionaries and Sets. For instance, the std::map
and std::set
containers in C++ are usually implemented as Red-black trees.
I used Quadtrees and Octrees only to solve a few Advent of Code puzzles that required spatial partitioning.
Graphs
I've only rarely had to use graphs for my daily job. In most cases, they were "natural" graphs - e.g., a dependency graph - that naturally formed an adjacency list.
Having said that, entire domains, such as Computer or Telecommunication Networks, Logistics, or Circuit Design, are built on graphs, so developers working in these domains work with graphs much more often.
This is my list. How about you? Are there data structures I haven't included, but you use them all the time? Or maybe you don't use some, which I consider a must. Please leave a comment - I'd love to know.
Image: Jorge Stolfi, CC BY-SA 3.0, via Wikimedia Commons
Posted on March 22, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
March 22, 2024