Explain Regular Expressions Like I'm Five
Savvas Stephanides
Posted on July 29, 2020
About
Browsing Twitter, especially in the #100DaysOfCode and #CodeNewbie hashtags, you'd be sure to soon find someone struggling with Regular Expressions, or "regex" - and for good reason. Even experienced software developers are on the same boat. I'm with you. Regex, still makes me dizzy even after years of using it.
Thus, here is my attempt at an "Explain Like I'm Five" for regex:
Okay kids, let's begin.
Regular expressions are a way of finding specific parts of something written. A bit like finding a specific part of a story book, or a certain word in a song.
Actually let's do this now: let's begin with a random song:
Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!
Let's find some words:
1: Find the word "star" in the song
Twinkle twinkle little [star]π,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!
Here it is, right there! On the first line of our song. That was easy!
Now let's try something else:
2: Find every character that's not a letter!
Twinkle twinkle little star[,]π
How I wonder what you are[!]π
Up above the world so high[!]π
Like a diamond in the sky[!]π
Now that looked a little bit more complex than our first exercise. But it wasn't too difficult, was it?
The reason you found it slightly more difficult was because you weren't looking for a specific word this time. You were looking for something else more general. You were looking at a... PATTERN!
You know patterns, right? They're on the shirt you're wearing, outside on the trees and leaves. They're everywhere!
Now let's try one more:
3. Find every word in the song that is 3 letters or less:
Twinkle twinkle little star,
[How]π [I]π wonder what [you]π [are]π!
[Up]π above [the]π world [so]π high!
Like [a]π diamond [in]π [the]π [sky]π!
Whoa! Now that was quite a bit more involved wasn't it? Go ahead and try it yourself!
Code talk
Now that you're familiarised yourself with the concept of "patterns" let's talk code. For this article, we're going to be coding in Javascript, but the expressions are exactly the same in all languages!
So say, you need to express some complex patterns in code.
Find the word "star"
Firstly, let's find the word star in the "Twinkle Twinkle Little Star" song, and replace it with "β". You probably already know how to do this. It's quite simple:
First let's store our poem as a variable:
var poem = `Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!`
Now let's replace our text using the replace()
function:
poem = poem.replace("star", "β")
console.log(poem)
This will be the output:
Twinkle twinkle little β,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!
Hurray ππ. Just what we need!
Find every capital letter in the song
Now we're starting to look for patterns, not just certain words. We could possibly iterate through every letter in every word and compare it to every capital letter in the English alphabet, but that's painful to even think about. Let's instead use a magical tool called REGULAR EXPRESSIONS!
Basically you need a way to tell your application "find any letter between A to Z (capitals)". The regular expression to express this is this:
[A-Z]
That's it! Now let's use Javascript to replace every capital letter with a "β€οΈ":
var poem = `Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!`
poem = poem.replace(/[A-Z]/g, "β€οΈ")
console.log(poem)
And here's the output:
β€οΈwinkle twinkle little star,
β€οΈow β€οΈ wonder what you are!
β€οΈp above the world so high!
β€οΈike a diamond in the sky!
Find every small letter in the song
In the exact same way, we can find all small letters, but the expression this time is this:
[a-z]
Let's use Javascript to replace all small letters with "πΆ":
var poem = `Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!`
poem = poem.replace(/[a-z]/g, "πΆ")
console.log(poem)
Output:
TπΆπΆπΆπΆπΆπΆ πΆπΆπΆπΆπΆπΆπΆ πΆπΆπΆπΆπΆπΆ πΆπΆπΆπΆ,
HπΆπΆ I πΆπΆπΆπΆπΆπΆ πΆπΆπΆπΆ πΆπΆπΆ πΆπΆπΆ!
UπΆ πΆπΆπΆπΆπΆ πΆπΆπΆ πΆπΆπΆπΆπΆ πΆπΆ πΆπΆπΆπΆ!
LπΆπΆπΆ πΆ πΆπΆπΆπΆπΆπΆπΆ πΆπΆ πΆπΆπΆ πΆπΆπΆ!
I hope these make sense by now.
A couple of notes
Before we continue to our final example, let's clarify a few stuff:
-
Notice how the letters in th regular expression are inside square brackets
[]
? In regex, this simply means "any character from series of characters":-
[A-Z]
means any letter A-Z -
[a-z]
means any letter a-z -
[0-9]
means any number 0-9 -
[A-Za-z0-9]
means any character, either capital letter, small letter or number
-
Notice how in the Javascript code, the regex starts with
/
and ends with/g
? This simply means "find everything in the text" (rather than just the first instance). There are more you can use. For example/i
means the search is "case-insensitive".
Final example: Find words that are 3 letters or less and replace them with "π".
This is more complex, but I'll explain. The expression for this pattern is this:
\b[A-Za-z]{1,3}\b
I can see you shaking your head and gasping so let's break this down:
- First, the familiar territory. Notice the
[A-Za-z]
there? If you remember, this means any letter capital or small. So far so good right? - Next to it, you see
{1,3}
. This simply means the pattern before it should be repeated between 1 and 3 times. Basically anywhere 1 to 3 letters appear next to each other. So, the words we need! - Lastly, there's
\b
in each end. This simply means "word boundaries". In other words, ignore half-words that happen to contain 1 to 3 letters in them.
In summary, the pattern above basically means: "Find characters that have 1 to 3 capital or small letters, that are surrounded by word bounderies". Exactly what we need.
Let's now use Javascript to replace these small words with "π"!
var poem = `Twinkle twinkle little star,
How I wonder what you are!
Up above the world so high!
Like a diamond in the sky!`
poem = poem.replace(/\b[A-Za-z]{1,3}\b/g, "π")
console.log(poem)
And here's the output:
Twinkle twinkle little star,
π π wonder what π π!
π above π world π high!
Like π diamond π π π!
ππ WHOOP WHOOP! ππ We made it!
That is all for now
I hope all this makes sense. I've only scratched the surface because there's a WHOLE lot more to regex, but I hope the basics make sense enough to get you started. Let me know how you found this article and happy regexing!
To learn more about regular expressions, here's a very useful cheat sheet.
Posted on July 29, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.