A developer tries to explain regular expressions

gjorgivarelov

gjorgivarelov

Posted on November 4, 2021

A developer tries to explain regular expressions

As a developer, at any given time, you have quite a few projects that need your attention and the deadlines are looming. With pressure on, you'd rather dedicate your time to those projects and possibly add even more on your to-do list than turning your focus on a time-consuming study of an infinitely nuanced craft with opaque explanations.

Some of us learned to shy away from regular expressions, some learned to use them, some even use them with enthusiasm. Hopefully this post will help us all turn into regex enthusiasts. Ideally, it is the aspiring developer that is reading this and before he or she commits to studying regular expressions.

Why do regular expressions need to be explained through metaphors familiar to developers?

Sure, the route of rote memorization of regular expression features (and the subsequent loss of what was learned) may be the easiest approach to learning regular expressions. Memorize what each token stands for, make notes and at time of need, go back to those notes to "refresh" the skill you "learned" (basically, re-learn it).
Instead of going through the typical pace of a regular expressions tutorial, where there's a list of regex features and ensuing explanations of each, let's try to understand regular expressions and as a result of that understanding use them later almost without effort. The goal is to avoid memorization and subsequent loss of what's learned. Imagine you have to look up in your notes the meaning of the question mark in regular expressions- its meaning changes with the context it is used in! How much time would you have saved if you innately knew the meaning of the question mark at that given moment! But if one could put the seemingly opaque craft in a context familiar to a developer, we'll all save ourselves from the resentment of learning and using regex.

Declarative programming metaphors and regular expressions

Going through the list of regular expressions' features, it would seem like its almost all declarative programming. Data is represented through tokens and is filtered according to particular tokens used in that given pattern. You don't have to construct the logical engine that will sift through the data, that's already done for you- in a similar way that HTML tags' meanings are already defined and rules are already set on how the browser will render the tagged data. String anchors, word boundary, character classes, these are obvious equivalents to tags in declarative programming. Not so obvious: quantifiers, the negation operation and groups.

One could even go so far as to construct a rudimentary DOM by naming/numbering groups of tokens. Want to limit how far can you iterate through that given DOM? Sure, that's what the flags are for (global, multiline, case-insensitive and others). Nesting tags? Well that's what the groups are for.

Imperative programming metaphors and regular expressions

Declarative programming metaphors cover a lot of features of regular expressions but if you prefer imperative programming, you aren't left alone as far as regular expressions. Albeit this may be the only feature that is equivalent to an if-then loop, it can be used to great effect when processing text with regular expressions: the lookaround.

Or actually lookarounds. Because you can look in front or you can look behind. Meaning: only look for match in strings that meet the following condition. Looks a lot like an "if-then" loop, doesn't it? I was so happy to learn about and use this feature, the "if-then" loop is a very familiar territory to anyone that has even tried to learn coding. You could instruct your machine to look for matches in front of the condition or behind that given condition. Perhaps you are called to match anything but a given condition? Negative lookarounds to the rescue. Meaning: see if a string doesn't match a given condition, and only if doesn't do you look further for matches.

Another imperative programming metaphor in regular expressions is the direction of execution. And we are all familiar with the top-down direction of execution, as we were taught imperative coding. Regular expressions also have their own direction of execution, and what is that direction becomes most obvious with one of the simplest of matches: a literal character followed by a quantifier. It's what's IN FRONT of the quantifier that decides what will be matched in the text you are rifling through. Imperative programming has the top-down sequence of execution, regular expressions have their "what's in front of the token is what'll match" direction.

I hope this post I wrote eases some of the resentment both aspiring and accomplished developers might have when trying to grasp and successfully use regular expressions during their work. Instead of writing a bonus paragraph on equivalents of logical and bitwise AND, OR and XOR in regular expressions by myself, I will leave such writing to you dear readers, write a comment on that subject in the comment section. Logical OR would be the pipe symbol. But what about AND? Or XOR?

P.S.

I highly recommend taking Bonnie Schulkin's course on regular expressions on Udemy, I must admit that I have been shying away from regex for years as my notes on the subject of regex kept stacking up, until taking her course on this subject, and I learned a lot about regular expressions during that course.

💖 💪 🙅 🚩
gjorgivarelov
gjorgivarelov

Posted on November 4, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related