My favourite computer science text (OR "how regex works")

hkrogstie

Håvard Krogstie

Posted on June 13, 2020

My favourite computer science text (OR "how regex works")

One of my favourite things in CS is reading about algorithms not as a tutorial, but as more of a historical presentation. My favourite example is https://swtch.com/~rsc/regexp/regexp2.html,
which discusses how Nondeterministic Finite Automata modelled as virtual machine bytecode can be used instead of traditional backtracking algorithms when regex matching. All these terms are explained in the text.

The reason I like the text so much is that the algorithms are thoroughly presented and explained, with references to original papers from the 1960s by names like Rob Pike and Ken Thompson. It's nice to recognize names "in the wild", and then see examples of just what those people did. Much better than reading their Wikipedia pages imo.

And more familiar (and unfamiliar) names appear as the history is discussed. The algorithm by Thompson was used in the original Unix grep. Alfred Aho (the A in awk) is one of the authors of the infamous Dragon Book, where the technique is discussed, but without clear attribution, as several people have made DFAs and NFAs from regular expressions, without necessarily saying they do (Thompson didn't, he just made the machine code).

Aho is of course the guy from the Aho-Corasick string searching algorithm, which I randomly had to learn at some point to solve a hackerrank-task, but which I mainly remember specifically because the name Aho suddenly appeared everywhere else afterward (I had held the Dragon book, but didn't bother remembering any of the authors at first).

Anyways, the article I shared is part 2/3 of a series, which is my favourite part for reasons discussed above, but you should absolutely read the other parts if you like it.

💖 💪 🙅 🚩
hkrogstie
Håvard Krogstie

Posted on June 13, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related