Regular Expressions (RegEx) Crash Course
codeSTACKr
Posted on May 29, 2020
In this article, we'll look at all of the essentials parts of regular expressions. Also referred to as Rej-ex or Reg-ex. We'll talk about what Reg-ex is and how we can use it in JavaScript.
Check out the video version too:
What are Regular Expressions?
They are a tool for finding patterns within a string of text.
There are several use cases. They are generally used to validate text from user input or to search through text to either extract a portion or to replace parts. Very much like find and replace in every word processor.
Almost every programming language implements regular expressions. JavaScript, for instance, has support for regular expressions built-in.
Regular expressions can be difficult to learn because it literally looks like gibberish to beginners.
The syntax is also not very intuitive. But if you take the time to understand and learn it, not only feel like you are decoding a German cipher, but you'll also see how powerful regular expressions can be.
Basics
The first tool that you need to bookmark is regexr.com. This site is essential!
You can enter an example of the text you will be searching, then start building your expression.
Regular expressions will always be contained within two forward slashes. The site adds them automatically for you. This will be important later when we look at using regex in JavaScript.
For the most basic example, we can search for any character or string of characters literally. So we can add to the expression "the" and it will find the first occurrence of "the".
Now we'll look at something called flags. These alter the way that the search is performed. The first flag that we'll add is the "global" flag. Notice that it adds a "g" after the closing forward slash. Now this search will find all occurrences of the expression "the".
Wildcard
There are several special characters that can be used to modify the search.
One is the period. This character is like a wildcard. It will match any character or white space except for a newline. Since we still have the global flag turned on, this will match every character in our text.
Let's turn that flag off. Now it is only matching the first character of our text.
Anchors
Another special character is the carrot ^
. This will match characters at the beginning of the string. So this will not change since we are still using the wildcard. Let's change that to "Knight". And that works too.
But if we change this to "the", you'll see that doesn't work since that is not at the beginning of our string.
We can also look for characters at the end of our string by using $
. So let's change it back to .
then add $
. Notice that the last character is a white space.
If we wanted to find the last period we can escape special characters by using a backslash before it. So let's add that. Now you'll see that it breaks since the last character is not a period. So in our expression, we can add a space between the period and the dollar sign. Now that works.
We have two lines here. By default, it will search as one big chunk. But if we wanted to find the same thing at the end of each line, we could turn on the multi-line flag. That adds an "m" to the end of the expression. Now it matches the first occurrence of the period and space at the end of a line. To find both we will need to turn the global flag back on.
Character Classes
Ok, let's get a little more advanced. We can use \w
to find any word character, alphanumeric and underscore. We can also use \d
to find any digit. These also have negative versions. Uppercase will search for the opposite. So /W
will find any characters that are not word characters. And \D
will find any characters that are not digits. We can also search for whitespace by using \s
and of course any non-whitespace using \S
.
We can create character sets by using square brackets. [abc]
will find any "a", "b", or "c" character. By default, the expression is case sensitive. We can turn that off by adding the case insensitive flag. That adds an "i" to the end of the expression. Now if we add "k" to the character set we'll see those results.
And of course, there is a way to negate this search. If we add the carrot, ^
, to the beginning of the set, everything that is not in the set will be found. And we can create character spans. [a-z]
will find any characters from a-z. Since we have case sensitivity turned off, this will find every letter. Let's turn the case sensitivity back on by removing the flag. Now, if we want to find all letters, uppercase or lowercase, we can add those to the character set. [a-zA-Z]
Quantifiers
There are several ways that we can define the quantity of the characters we are searching for. \d
will find the digits in our string. If we hover over these, we'll see that it is matching these individually.
To match all digits together, we can use \d*
. This is a greedy search and will match as many as it can. Another way to match multiple is using plus, +
. This time let's search for n+
. Plus will find one or more occurrences of the character. Notice where the two n's are found together. We can also use a question mark, ?
. The question mark is lazy. It matches 0 - 1 occurrence. It doesn't care if it finds anything. It will stop at the first occurrence. Now, if we want to find a specific number of occurrences we can use curly braces. \d{3}
will find three digits together. See how it groups the digits in three's and the last digit is left out?
We can also use \d{3,}
to find three or more. Now it groups all of them. Lastly, we can use \d{3,6}
to find anywhere from three to six characters. So here it matches the first six digits, then the last four.
We can also search for two things by using |
. This is like saying "or". For example, the|of
will find all of the "the" and "of" words.
Grouping
We can create groups by surrounding them with parenthesis. So let's search for (\d{3})
with the global flag turned off. This will find the first 3 digits. If we hover over that, it will show us what is included in the group.
Let's say that this is a phone number. A very basic phone number search would be (\d{3})(\d{3})(\d{4})
. Now when we hover, it shows all three groups.
So far we have only searched for characters. We can manipulate and even replace characters as well with regex.
So let's open the replace feature. With nothing here, it removes the matches. The default way to reference the groups is by using a dollar sign and the group number. So if we enter $1
we'll see the first group. Now let's enter $1-$2-$3
. Now it's formatted like a phone number.
Optionally, we can name the capture groups. We do that using ?<name>
within the group. So if we wanted to identify the area code we could do this: (?<areacode>\d{3})(\d{3})(\d{4})
.
We can exclude a group by adding ?:
to the beginning of the group. Now we only have two groups.
We can also do something called a lookahead. Knight(?= Rider)
will match "Knight" that is followed by " Rider". Notice the space before Rider. This is called a positive lookahead. We can do a negative lookahead like this: Knight(?! Rider)
. This will match the opposite; Knight that is not followed by " Rider".
Password Example
In this example, we want to check a given password strength and prove that it meets the given requirements. The requirement is that it has at least one of the following: capitalized character, number character, and special character. We also want to make sure that the password is at least eight characters long.
We'll use positive lookaheads to find digits, lowercase characters, uppercase characters, and special characters. Then we'll check that it has at least eight characters.
(?=.**[\d])(?=.**[a-z])(?=.**[A-Z])(?=.**[!@#$%^&*]).{8,}
In JavaScript
Ok, now let's see how we can use this in JavaScript.
A regular expression in JavaScript is an object. We can define it in two ways.
const regex = new RegExp('hello');
const regex = /hello/;
This defines the regex pattern.
We can test strings for matches by using .test()
. This will return a boolean for the match.
const rx = /hello/;
const result = rx.test('hello world'); // true
We can also search strings using the string method .search()
. This will return the index of the match.
const str = "hello world";
const rx = /world/;
const result = str.search(rx); // 6
And we can replace portions of the string by using the string method replace()
. The first parameter is the Regular Expression and the second parameter is the replacement.
const str = "YouTube is Awesome!";
const rx = /YouTube/;
const result = str.replace(rx, "codeSTACKr"); // "codeSTACKr is Awesome!"
Thanks for reading!
Posted on May 29, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.