RegEx - A Teeny, Tiny Taster
Shujaat Azim
Posted on April 19, 2021
Since beginning my coding journey, few topics have confused me the way RegEx has. I saw them as little more than gibberish, random symbols in between slashes that meant little to nothing. Thankfully, I wasn’t alone in my bewilderment, and I was able to eventually learn how to make them bend to my will (...kinda)!
First off, let me just clarify that RegEx is pronounced "reg-ex" and not “ree-jex” as some trolls have tried to perpetuate. It stands for “Regular Expressions,” with “regular” referring to their origins in mathematically regular languages. This basis is what allows RegExs to be used across programming languages (“language-agnostic”), lending to their usefulness. It also means that they are almost “purely” logical. However, as mere mortal humans, we are not logical beings; therefore, RegEx’s tend to be exceedingly confusing.
But confusing for what, exactly? Simply put, RegEx’s are used for describing patterns in strings. A pattern can be certain words, the ordering of letters, strange characters, spaces, and just about anything else you can think of that can go into a string. They allow us to specifically target certain “points of interest” in string data. For example, how would we target the underscore characters for removal in the following JavaScript and Ruby strings?
// javascript
let string = "Hello_my_name_is_Shujaat"
# ruby
string = "Hello_my_name_is_Shujaat"
Well, we could use some higher-order functions to accomplish this:
JavaScript
let string = "Hello_my_name_is_Shujaat"
let splitString = string.split("_")
console.log(splitString)
// Logs ["Hello", "my", "name", "is", "Shujaat"]
let newString = splitString.join(" ")
console.log(newString)
// Logs "Hello my name is Shujaat"
Ruby
string = "Hello_my_name_is_Shujaat"
split_string = string.split("_")
puts split_string
# Outputs [Hello, my, name, is, Shujaat]
new_string = split_string.join(" ")
puts new_string
# Outputs "Hello my name is Shujaat"
This is a purposefully simple example; it can all be done in fewer lines by omitting the variables and chaining the methods together. If the string was more complicated, perhaps with a bunch of different symbols, spaces, numbers, and capital letters all over the place, it would be significantly harder to make it readable to human eyes. HTML data, for instance, is just an insanely complicated string at the end of the day. But for brevity’s sake, let’s take a look at the following, slightly-more-complicated string:
"Hello_my!name&is8Shujaat"
It would take several different higher order functions (one for each of the weird characters in between the words) to remove the nonsense characters and replace them with spaces. That isn't practical, even in this one sentence string; so imagine how cumbersome it would be in an HTML doc!
The solution? We can use RegEx to filter the string of all the non-letter characters and return the simple string:
JavaScript
let complexString = "Hello_my!name&is8Shujaat"
let regex = /[0-9_!&\s]/g
console.log(complexString.replace(regex, " "))
//logs "Hello my name is Shujaat"
All I did here was create a set of conditions (called a "literal creation") in between two slashes, and added a global flag (g) at the end. The /[0-9_!&\s]/g
translates as "any integer between 0 and 9, AND any underscores, AND any exclamations, AND any ampersands, AND any whitespaces, across the WHOLE string."
The .replace() method takes two arguments, the "target" and the "replacement." Without RegEx, we would have to use a separate .replace() for EACH target, which quickly bloats and obfuscates our code. However, storing all the conditions in a variable using RegEx allows us to target everything at once! The global flag outside the slashes indicates that we would like to identify the targets across the whole string and replace them all with spaces (" ") - without it, we would stop at the first match by default.
Ruby
complex_string = "Hello_my!name&is8Shujaat"
new_string = complex_string.gsub(/[0-9_!&\s]/, " ")
puts new_string
# Outputs "Hello my name is Shujaat"
This is very similar to the JavaScript solution above, but it has a couple important differences. We still need to create the set of conditions, and because RegEx is language agnostic, it's the same as the conditions we used before: /[0-9_!&\s]/
However, instead of .replace, we're using the .gsub method, which means "global substitution." Therefore, we don't need the g flag in the RegEx.
Phew!
You might be wondering if you have to memorize all the ridiculous conditions, the varying syntaxes, and all the different flag names. I have good news - you don't! There are many resources available that will help you set up your RegEx options, let you input your strings, and spit out a result. Two commonly used ones are:
JavaScript: https://regex101.com/
Ruby: https://rubular.com/
Here's a chart of common RegEx options too:
Yes, Regular Expressions are inherently unintuitive. The mere fact that conditions are chained together without spaces drove me nuts when I first learned about them. This in turn leads to programmers, especially aspiring ones, to completely ignore them as a tool. In fact, I found many “how can I do this WITHOUT RegEx” questions on StackOverflow when I was poking around. But had I taken to RegEx earlier in my learning, I'd have solved many of my early coding challenges far more easily! So definitely don't ignore them, use all the tools available!
:)
Posted on April 19, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.