Demystifying Regular Expressions with JavaScript
Juan Cruz Martinez
Posted on July 2, 2020
The first time I encounter a regular expression was many years ago, but I still remember my first thoughts on it:
- What is this string-like thing?
- I rather don't touch it, looks scary
I don't remember quite well what that regex was doing, or how exactly looked like, but it scared me to death. Looking in retrospective I realize that it was probably not scary at all and that in fact, it was an easy way to solve the problem in hand. But why did I got this feeling? It's just the awkwardness of the syntax, they certainly look strange, and if you don't know what they are, they look very complicated.
My intention here is not to scare you out, regex can be simple once we understand them, but if you don't and you look at something like this:
^\(*\d{3}\)*( |-)*\d{3}( |-)*\d{4}$
Can be intimidating...
Today we are going to demystify regular expressions, we will see what they are, what they are useful for, and how you can design your regular expressions to solve problems.
What are regular expressions
Regular expressions are a way to describe patterns in data strings. They have their own syntax, as is they are their own programming language, and there are methods and ways to interact with regular expressions in most (if not all) programming languages.
But what kind of patterns are we talking about? common examples of regular expressions determine for example if a given string is an email address or a phone number, or they can be used to verify if a password fulfills a certain complexity.
Once you have the pattern, what can you do with the regular expressions?
- validate a string with the pattern
- search within a string
- replace substrings in a string
- extract information from a string
Working with regular expressions
For this article, we are going to cover how to work with regular expressions in JavaScript, though the concepts learned here apply to other languages as well. With that said, in other languages, there may be some differences in the way they treat regular expressions.
Let's look at an example that will validate if the string contains the word Hello
or not.
In JavaScript there are 2 ways to going about this:
- Constructor
- Literal
Constructor
const regex = new RegExp('Hello')
const result = regex.test('Hello World!')
console.log(result)
--------------
Output
--------------
true
Literal
const regex = /Hello/
const result = regex.test('Hello World!')
console.log(result)
--------------
Output
--------------
true
In both scenarios, the variable regex
is an object, which exposes different methods we can use to interact with the regular expression. However, the first example has a more familiar look, instancing an object with a string
as a parameter. In the second scenario things look a bit weird, there is something that resembles a string
but instead of quotes is wrapped in /
. As it turns out both ways represent the same, I personally like the second option, which is very clean, and IDEs or code editors can have syntax highlighting on the regular expression compared to the first scenario where the regular expression pattern is defined just as a string.
So far our regular expressions have been fairly simple, is just the exact match on the string Hello
and it worked perfectly for JavaScript, however the result we obtained can be different for other languages, even though the regular expression is the same. This is because each programming language can define certain defaults or special behaviors in their regular expressions which can vary from one to another. So sorry about that, but is just how it is. When we build a RegEx, though for the most part will be the same in most programming languages, before you use it somewhere else you will have to test it and adjust it if necessary.
Different uses of regular expressions
When working with regular expressions we are basically working with the RegExp object methods, or with string methods which allows us to interact with regular expressions.
RegExp.prototype.test()
The test()
method executes a search for a match between a regular expression and a specified string. Returns true
or false
.
Example: Look if the specified string contains the string foo
const str = 'table football'
const regex = RegExp('foo')
console.log(regex.test(str))
-------------
Output
-------------
true
RegExp.prototype.exec()
The exec()
method executes a search for a match in a specified string. Returns a result array, or null.
Example: Look for all the instances of foo
in the given string
const str = 'table football, foosball'
const regex = /foo/g
let result;
while ((result = regex.exec(str)) !== null) {
console.log(`Found ${result[0]} at ${result.index}.`);
}
-------------
Output
-------------
Found foo at 6.
Found foo at 16.
String.prototype.match()
The match()
method retrieves the result of matching a string against a regular expression.
Example: Find all the capital letters on a string
const paragraph = 'The quick brown fox jumps over the lazy dog. It barked.'
const regex = /[A-Z]/g
const found = paragraph.match(regex)
console.log(found)
-------------
Output
-------------
Array ["T", "I"]
String.prototype.matchAll()
The matchAll()
method returns an iterator of all results matching a string against a regular expression, including capturing groups.
Example: Find occurrences of a string in groups
const regexp = /t(e)(st(\d?))/g
const str = 'test1test2'
const arr = [...str.matchAll(regexp)]
console.log(arr)
-------------
Output
-------------
(2) [Array(4), Array(4)]
-> 0: Array(4)
0: "test1"
1: "e"
2: "st1"
3: "1"
groups: undefined
index: 0
input: "test1test2"
lastIndex: (...)
lastItem: (...)
length: 4
__proto__: Array(0)
-> 1: Array(4)
0: "test2"
1: "e"
2: "st2"
3: "2"
groups: undefined
index: 5
input: "test1test2"
lastIndex: (...)
lastItem: (...)
length: 4
__proto__: Array(0)
lastIndex: (...)
lastItem: (...)
length: 2
String.prototype.search()
The search()
method executes a search for a match between a regular expression and this string object. It returns the index at which the matched happened, or -1 is there is no match.
Example: Find the position of any character that is not a word character or white space
const paragraph = 'The quick brown fox jumps over the lazy dog. If the dog barked, was it really lazy?'
// any character that is not a word character or whitespace
const regex = /[^\w\s]/g;
console.log(paragraph.search(regex));
console.log(paragraph[paragraph.search(regex)]);
-------------
Output
-------------
43
.
String.prototype.replace()
The replace()
method returns a new string with some or all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function to be called for each match. If the pattern is a string, only the first occurrence will be replaced.
Note that the original string will remain unchanged.
Example: Replace the word dog with monkey
const paragraph = 'The quick brown fox jumps over the lazy dog. If the dog barked, was it really lazy?'
const regex = /dog/gi
console.log(paragraph.replace(regex, 'monkey'))
console.log(paragraph.replace('dog', 'monkey'))
-------------
Output
-------------
The quick brown fox jumps over the lazy monkey. If the monkey barked, was it really lazy?
The quick brown fox jumps over the lazy monkey. If the dog barked, was it really lazy?
Not to be mistaken here, the method replace() uses regular expressions, so even when we pass a string, it will be interpreted as a regular expression and executed as such. Hence the reason why on the second console.log the word dog got replaced only once. But we will cover more on that later.
String.prototype.replaceAll()
The replaceAll()
method returns a new string with all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function to be called for each match.
Example: Replace the word dog with monkey
const paragraph = 'The quick brown fox jumps over the lazy dog. If the dog barked, was it really lazy?'
const regex = /dog/gi
console.log(paragraph.replaceAll(regex, 'monkey'))
console.log(paragraph.replaceAll('dog', 'monkey'))
-------------
Output
-------------
The quick brown fox jumps over the lazy monkey. If the monkey barked, was it really lazy?
The quick brown fox jumps over the lazy monkey. If the monkey barked, was it really lazy?
Similar to before, but now we replace all the matches. I usually avoid this function as I can always do it with regular expressions and using the replace() function plus is a function that is not supported in all platforms/browsers.
String.prototype.split()
The split()
method divides a String into an ordered set of substrings, puts these substrings into an array, and returns the array. The division is done by searching for a pattern; where the pattern is provided as the first parameter in the method's call.
Example:
const str = 'a1 b2 c3 d4 la f5'
const sections = str.split(/\d/);
console.log(sections)
-------------
Output
-------------
[ 'a', ' b', ' c', ' d', ' la f', '' ]
Building regular expressions
Now that we know how to work with regular expressions and the different methods which are available to interact with them, let's spend some time building regular expressions to match the patterns we want.
Anchoring
/hello/
will match hello
wherever it was put inside the string. If you want to match strings that start with hello, use the ^
operator:
/^hello/.test('hello world') //✅
/^hello/.test('from JS, hello world') //❌
If you want to match strings that end with hello
, use the $
operator:
/world$/.test('hello world') //✅
/world$/.test('hello world!') //❌
You can also combine them to find exact matches
/^hello$/.test('hello') //✅
To find strings with wildcards in the middle you can use .*
, which matches any characted repeated 0 or more times:
/^hello.*Juan$/.test('hello world Juan') //✅
/^hello.*Juan$/.test('helloJuan') //✅
/^hello.*Juan$/.test('hello Juan!') //❌
Match items by character or numeric range
Once very cool feature of regular expressions is the ability to match by character or numeric range, what do I mean by range?, something like:
/[a-z]/ // a, b, c ..., x, y, z
/[A-Z]/ // A B, C ..., X, Y, Z
/[a-d]/ // a, b, c, d
/[0-9]/ // 0, 1, 2, ..., 8, 9
These type regex patterns will match when at least on of the characters in the range match:
/[a-z]/.test('a') //✅
/[a-z]/.test('1') //❌
/[a-z]/.test('A') //❌
/[a-d]/.test('z') //❌
/[a-d]/.test('zdz') //✅
You can also combine ranges:
/[a-zA-Z0-9]/.test('a') //✅
/[a-zA-Z0-9]/.test('1') //✅
/[a-zA-Z0-9]/.test('Z') //✅
Negating a pattern
We saw that the ^
character at the beginning of a pattern anchors it to the beginning of a string. However when used inside a range, it negates it, so:
/[^a-zA-Z0-9]/.test('a') //❌
/[^a-zA-Z0-9]/.test('1') //❌
/[^a-zA-Z0-9]/.test('Z') //❌
/[^a-zA-Z0-9]/.test('@') //✅
Meta-characters
There are special characters in regular expressions that take special meaning, some of them are:
-
\d
matches any digit, equivalent to [0-9] -
\D
matches any character that’s not a digit, equivalent to [^0-9] -
\w
matches any alphanumeric character (plus underscore), equivalent to [A-Za-z_0-9] -
\W
matches any non-alphanumeric character, anything except [^A-Za-z_0-9] -
\s
matches any whitespace character: spaces, tabs, newlines and Unicode spaces -
\S
matches any character that’s not a whitespace -
\0
matches null -
\n
matches a newline character -
\t
matches a tab character -
\uXXXX
matches a unicode character with code XXXX (requires the u flag) - . matches any character that is not a newline char (e.g. \n) (unless you use the s flag, explained later on)
-
[^]
matches any character, including newline characters. It’s useful on multiline strings -
\b
matches a set of characters at the beginning or end of a word -
\B
matches a set of characters not at the beginning or end of a word
Regular expression choices (or)
If you want to search one string or another, use the | operator:
/foo|bar/.test('foo') //✅
/foo|bar/.test('bar') //✅
Quantifiers
Quantifiers are special operators, here are some of them:
?
: optional quantifier
Imagine you need to find if a string contains one digit in it, just the one, you can do something like:
/^\d$/.test('1') //✅
/^\d$/.test('a') //❌
/^\d$/.test('') //❌
/^\d?$/.test('') //✅
+
: 1 ore more
Matches one or more (>=1) items:
/^\d+$/.test('12') //✅
/^\d+$/.test('14') //✅
/^\d+$/.test('144343') //✅
/^\d+$/.test('') //❌
/^\d+$/.test('1a') //❌
*
: 0 ore more
Matches cero or more (>=0) items:
/^\d*$/.test('12') //✅
/^\d*$/.test('14') //✅
/^\d*$/.test('144343') //✅
/^\d*$/.test('') //✅
/^\d*$/.test('1a') //❌
{n}
: fixed number of matches
Matches exactly n items:
^\d{3}$/.test('123') //✅
/^\d{3}$/.test('12') //❌
/^\d{3}$/.test('1234') //❌
/^[A-Za-z0-9]{3}$/.test('Abc') //✅
{n, m}
: n to m number of matches
Matches between n and m times:
/^\d{3,5}$/.test('123') //✅
/^\d{3,5}$/.test('1234') //✅
/^\d{3,5}$/.test('12345') //✅
/^\d{3,5}$/.test('123456') //❌
m
can also be omitted, in that case, it will match at least n items:
/^\d{3,}$/.test('12') //❌
/^\d{3,}$/.test('123') //✅
/^\d{3,}$/.test('12345') //✅
/^\d{3,}$/.test('123456789') //✅
Escaping
As we saw already, there are certain characters which have a special meaning, but what if we want to match by one of those characters? it is possible to escape special characters with \, let's see an example:
/^\^$/.test('^') //✅
/^\$$/.test('$') //✅
Groups
Using parentheses, you can create groups of characters: (...)
:
/^(\d{3})(\w+)$/.test('123') //❌
/^(\d{3})(\w+)$/.test('123s') //✅
/^(\d{3})(\w+)$/.test('123something') //✅
/^(\d{3})(\w+)$/.test('1234') //✅
You can also use the qualifiers (like the repetition or the optional qualifier) for a group:
/^(\d{2})+$/.test('12') //✅
/^(\d{2})+$/.test('123') //❌
/^(\d{2})+$/.test('1234') //✅
Groups are also very interesting, as when used with functions like match()
and exec()
as we saw before, they can be captured separately:
Example with exec()
:
const str = 'table football, foosball'
const regex = /foo/g
let result;
while ((result = regex.exec(str)) !== null) {
console.log(`Found ${result[0]} at ${result.index}.`);
}
-------------
Output
-------------
Found foo at 6.
Found foo at 16.
Example with match()
:
const paragraph = 'The quick brown fox jumps over the lazy dog. It barked.'
const regex = /[A-Z]/g
const found = paragraph.match(regex)
console.log(found)
-------------
Output
-------------
Array ["T", "I"]
Named capture groups
With ES2018 it is now possible to assign names to groups, so that working with the results is much easier, take a look at the following example without naming groups:
const re = /(\d{4})-(\d{2})-(\d{2})/
const result = re.exec('2015-01-02')
console.log(result)
-------------
Output
-------------
["2015-01-02", "2015", "01", "02", index: 0, input: "2015-01-02", groups: undefined]
Now using named groups:
const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
const result = re.exec('2015-01-02')
console.log(result)
-------------
Output
-------------
(4) ["2015-01-02", "2015", "01", "02", index: 0, input: "2015-01-02", groups: {…}]
0: "2015-01-02"
1: "2015"
2: "01"
3: "02"
groups: {year: "2015", month: "01", day: "02"}
index: 0
input: "2015-01-02"
length: 4
Now inside the groups result, we can easily capture each one of them.
Flags
As we saw in the constructor example, and we used it through out the article, regular expressions have some flags which change the behavior for the matches:
- g: matches the pattern multiple times
- i: makes the regex case insensitive
- m: enables multiline mode. In this mode, ^ and $ match the start and end of the whole string. Without this, with multiline strings they match the beginning and end of each line.
- u: enables support for unicode (introduced in ES6/ES2015)
- s: short for single line, it causes the . to match new line characters as well
Flags can be combined, and in the case of regex literals they are set at the end of the regex:
/hello/ig.test('Hello') //✅
Or using the constructor as a second parameter of the function:
new RegExp('hello', 'ig').test('Hello') //✅
That was a lot, enough with that, let's see some cool examples.
Cool Examples
Password Strength
^(?=.*[A-Z].*[A-Z])(?=.*[!@#$&*])(?=.*[0-9].*[0-9])(?=.*[a-z].*[a-z].*[a-z]).{8}$
Checks a password's strength, useful if you want to build your own password validator. I know this is subjective, as different services may have different needs, but it's a great place to start
Validate email address
/[A-Z0-9._%+-]+@[A-Z0-9-]+.+.[A-Z]{2,4}/igm
Probably one of the most famous cases for regular expressions, validating email addresses
IP Addresses
V4:
/\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b/
V6:
(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
Pull domain from URL
/https?:\/\/(?:[-\w]+\.)?([-\w]+)\.\w+(?:\.\w+)?\/?.*/i
Example of use:
'https://livecodestream.com/'.match(/https?:\/\/(?:[-\w]+\.)?([-\w]+)\.\w+(?:\.\w+)?\/?.*/i)
-------------
Output
-------------
(2) ["https://livecodestream.com/", "livecodestream", index: 0, input: "https://livecodestream.com/", groups: undefined]
Pull image source
/< *[img][^\>]*[src] *= *[\"\']{0,1}([^\"\'\ >]*)/
Example of use:
'<img src="https://livecodestream.com/featured.jpg" />'.match(/< *[img][^\>]*[src] *= *[\"\']{0,1}([^\"\'\ >]*)/)
-------------
Output
-------------
(2) ["<img src="https://livecodestream.com/featured.jpg", "https://livecodestream.com/featured.jpg", index: 0, input: "<img src="https://livecodestream.com/" />", groups: undefined]
Credit Card Numbers
^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$
Conclusion
Regular expressions are a very powerful feature, that can intimidate at first but once you get the hang of it they are pretty cool. Today we learn what they are, how to use them, how to build them and some cool examples, I hope that the next time you see one of them in your projects you don't run away (like I did), and you try to understand it and work with it.
Thanks so much for reading!
If you like the story, please don't forget to subscribe to our free newsletter so we can stay connected: https://livecodestream.dev/subscribe
Posted on July 2, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.