Demystifying Regular Expressions with JavaScript

bajcmartinez

Juan Cruz Martinez

Posted on July 2, 2020

Demystifying Regular Expressions with JavaScript

The first time I encounter a regular expression was many years ago, but I still remember my first thoughts on it:

  1. What is this string-like thing?
  2. I rather don't touch it, looks scary

I don't remember quite well what that regex was doing, or how exactly looked like, but it scared me to death. Looking in retrospective I realize that it was probably not scary at all and that in fact, it was an easy way to solve the problem in hand. But why did I got this feeling? It's just the awkwardness of the syntax, they certainly look strange, and if you don't know what they are, they look very complicated.

My intention here is not to scare you out, regex can be simple once we understand them, but if you don't and you look at something like this:

^\(*\d{3}\)*( |-)*\d{3}( |-)*\d{4}$
Enter fullscreen mode Exit fullscreen mode

Can be intimidating...

Today we are going to demystify regular expressions, we will see what they are, what they are useful for, and how you can design your regular expressions to solve problems.


What are regular expressions

Regular expressions are a way to describe patterns in data strings. They have their own syntax, as is they are their own programming language, and there are methods and ways to interact with regular expressions in most (if not all) programming languages.

But what kind of patterns are we talking about? common examples of regular expressions determine for example if a given string is an email address or a phone number, or they can be used to verify if a password fulfills a certain complexity.

Once you have the pattern, what can you do with the regular expressions?

  • validate a string with the pattern
  • search within a string
  • replace substrings in a string
  • extract information from a string

Working with regular expressions

For this article, we are going to cover how to work with regular expressions in JavaScript, though the concepts learned here apply to other languages as well. With that said, in other languages, there may be some differences in the way they treat regular expressions.

Let's look at an example that will validate if the string contains the word Hello or not.

In JavaScript there are 2 ways to going about this:

  • Constructor
  • Literal

Constructor

const regex = new RegExp('Hello')
const result = regex.test('Hello World!')
console.log(result)

--------------
Output
--------------
true
Enter fullscreen mode Exit fullscreen mode

Literal

const regex = /Hello/
const result = regex.test('Hello World!')
console.log(result)

--------------
Output
--------------
true
Enter fullscreen mode Exit fullscreen mode

In both scenarios, the variable regex is an object, which exposes different methods we can use to interact with the regular expression. However, the first example has a more familiar look, instancing an object with a string as a parameter. In the second scenario things look a bit weird, there is something that resembles a string but instead of quotes is wrapped in /. As it turns out both ways represent the same, I personally like the second option, which is very clean, and IDEs or code editors can have syntax highlighting on the regular expression compared to the first scenario where the regular expression pattern is defined just as a string.

So far our regular expressions have been fairly simple, is just the exact match on the string Hello and it worked perfectly for JavaScript, however the result we obtained can be different for other languages, even though the regular expression is the same. This is because each programming language can define certain defaults or special behaviors in their regular expressions which can vary from one to another. So sorry about that, but is just how it is. When we build a RegEx, though for the most part will be the same in most programming languages, before you use it somewhere else you will have to test it and adjust it if necessary.


Different uses of regular expressions

When working with regular expressions we are basically working with the RegExp object methods, or with string methods which allows us to interact with regular expressions.

RegExp.prototype.test()

The test() method executes a search for a match between a regular expression and a specified string. Returns true or false.

Example: Look if the specified string contains the string foo

const str = 'table football'

const regex = RegExp('foo')
console.log(regex.test(str))

-------------
Output
-------------
true
Enter fullscreen mode Exit fullscreen mode

RegExp.prototype.exec()

The exec() method executes a search for a match in a specified string. Returns a result array, or null.

Example: Look for all the instances of foo in the given string

const str = 'table football, foosball'
const regex = /foo/g

let result;
while ((result = regex.exec(str)) !== null) {
  console.log(`Found ${result[0]} at ${result.index}.`);
}

-------------
Output
-------------
Found foo at 6.
Found foo at 16.
Enter fullscreen mode Exit fullscreen mode

String.prototype.match()

The match() method retrieves the result of matching a string against a regular expression.

Example: Find all the capital letters on a string

const paragraph = 'The quick brown fox jumps over the lazy dog. It barked.'
const regex = /[A-Z]/g
const found = paragraph.match(regex)
console.log(found)

-------------
Output
-------------
Array ["T", "I"]
Enter fullscreen mode Exit fullscreen mode

String.prototype.matchAll()

The matchAll() method returns an iterator of all results matching a string against a regular expression, including capturing groups.

Example: Find occurrences of a string in groups

const regexp = /t(e)(st(\d?))/g
const str = 'test1test2'

const arr = [...str.matchAll(regexp)]

console.log(arr)

-------------
Output
-------------
(2) [Array(4), Array(4)]
    -> 0: Array(4)
        0: "test1"
        1: "e"
        2: "st1"
        3: "1"
        groups: undefined
        index: 0
        input: "test1test2"
        lastIndex: (...)
        lastItem: (...)
        length: 4
        __proto__: Array(0)
    -> 1: Array(4)
        0: "test2"
        1: "e"
        2: "st2"
        3: "2"
        groups: undefined
        index: 5
        input: "test1test2"
        lastIndex: (...)
        lastItem: (...)
        length: 4
    __proto__: Array(0)
    lastIndex: (...)
    lastItem: (...)
    length: 2
Enter fullscreen mode Exit fullscreen mode

String.prototype.search()

The search() method executes a search for a match between a regular expression and this string object. It returns the index at which the matched happened, or -1 is there is no match.

Example: Find the position of any character that is not a word character or white space

const paragraph = 'The quick brown fox jumps over the lazy dog. If the dog barked, was it really lazy?'

// any character that is not a word character or whitespace
const regex = /[^\w\s]/g;

console.log(paragraph.search(regex));
console.log(paragraph[paragraph.search(regex)]);

-------------
Output
-------------
43
.
Enter fullscreen mode Exit fullscreen mode

String.prototype.replace()

The replace() method returns a new string with some or all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function to be called for each match. If the pattern is a string, only the first occurrence will be replaced.

Note that the original string will remain unchanged.

Example: Replace the word dog with monkey

const paragraph = 'The quick brown fox jumps over the lazy dog. If the dog barked, was it really lazy?'

const regex = /dog/gi

console.log(paragraph.replace(regex, 'monkey'))
console.log(paragraph.replace('dog', 'monkey'))

-------------
Output
-------------
The quick brown fox jumps over the lazy monkey. If the monkey barked, was it really lazy?
The quick brown fox jumps over the lazy monkey. If the dog barked, was it really lazy?
Enter fullscreen mode Exit fullscreen mode

Not to be mistaken here, the method replace() uses regular expressions, so even when we pass a string, it will be interpreted as a regular expression and executed as such. Hence the reason why on the second console.log the word dog got replaced only once. But we will cover more on that later.

String.prototype.replaceAll()

The replaceAll() method returns a new string with all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function to be called for each match.

Example: Replace the word dog with monkey

const paragraph = 'The quick brown fox jumps over the lazy dog. If the dog barked, was it really lazy?'

const regex = /dog/gi

console.log(paragraph.replaceAll(regex, 'monkey'))
console.log(paragraph.replaceAll('dog', 'monkey'))

-------------
Output
-------------
The quick brown fox jumps over the lazy monkey. If the monkey barked, was it really lazy?
The quick brown fox jumps over the lazy monkey. If the monkey barked, was it really lazy?
Enter fullscreen mode Exit fullscreen mode

Similar to before, but now we replace all the matches. I usually avoid this function as I can always do it with regular expressions and using the replace() function plus is a function that is not supported in all platforms/browsers.

String.prototype.split()

The split() method divides a String into an ordered set of substrings, puts these substrings into an array, and returns the array.  The division is done by searching for a pattern; where the pattern is provided as the first parameter in the method's call.

Example:

const str = 'a1 b2 c3 d4 la f5'
const sections = str.split(/\d/);
console.log(sections)

-------------
Output
-------------
[ 'a', ' b', ' c', ' d', ' la f', '' ]
Enter fullscreen mode Exit fullscreen mode

Building regular expressions

Now that we know how to work with regular expressions and the different methods which are available to interact with them, let's spend some time building regular expressions to match the patterns we want.

Anchoring

/hello/
Enter fullscreen mode Exit fullscreen mode

will match hello wherever it was put inside the string. If you want to match strings that start with hello, use the ^ operator:

/^hello/.test('hello world')            //✅
/^hello/.test('from JS, hello world')   //❌
Enter fullscreen mode Exit fullscreen mode

If you want to match strings that end with hello, use the $ operator:

/world$/.test('hello world')    //✅
/world$/.test('hello world!')   //❌
Enter fullscreen mode Exit fullscreen mode

You can also combine them to find exact matches

/^hello$/.test('hello')     //✅
Enter fullscreen mode Exit fullscreen mode

To find strings with wildcards in the middle you can use .*, which matches any characted repeated 0 or more times:

/^hello.*Juan$/.test('hello world Juan')      //✅
/^hello.*Juan$/.test('helloJuan')             //✅
/^hello.*Juan$/.test('hello Juan!')           //❌
Enter fullscreen mode Exit fullscreen mode

Match items by character or numeric range

Once very cool feature of regular expressions is the ability to match by character or numeric range, what do I mean by range?, something like:

/[a-z]/ // a, b, c ..., x, y, z
/[A-Z]/ // A B, C ..., X, Y, Z
/[a-d]/ // a, b, c, d
/[0-9]/ // 0, 1, 2, ..., 8, 9
Enter fullscreen mode Exit fullscreen mode

These type regex patterns will match when at least on of the characters in the range match:

/[a-z]/.test('a')      //✅
/[a-z]/.test('1')      //❌
/[a-z]/.test('A')      //❌

/[a-d]/.test('z')      //❌
/[a-d]/.test('zdz')    //✅
Enter fullscreen mode Exit fullscreen mode

You can also combine ranges:

/[a-zA-Z0-9]/.test('a')  //✅
/[a-zA-Z0-9]/.test('1')  //✅
/[a-zA-Z0-9]/.test('Z')  //✅
Enter fullscreen mode Exit fullscreen mode

Negating a pattern

We saw that the ^ character at the beginning of a pattern anchors it to the beginning of a string. However when used inside a range, it negates it, so:

/[^a-zA-Z0-9]/.test('a')  //❌
/[^a-zA-Z0-9]/.test('1')  //❌
/[^a-zA-Z0-9]/.test('Z')  //❌
/[^a-zA-Z0-9]/.test('@')  //✅
Enter fullscreen mode Exit fullscreen mode

Meta-characters

There are special characters in regular expressions that take special meaning, some of them are:

  • \d matches any digit, equivalent to [0-9]
  • \D matches any character that’s not a digit, equivalent to [^0-9]
  • \w matches any alphanumeric character (plus underscore), equivalent to [A-Za-z_0-9]
  • \W matches any non-alphanumeric character, anything except [^A-Za-z_0-9]
  • \s matches any whitespace character: spaces, tabs, newlines and Unicode spaces
  • \S matches any character that’s not a whitespace
  • \0 matches null
  • \n matches a newline character
  • \t matches a tab character
  • \uXXXX matches a unicode character with code XXXX (requires the u flag)
  • . matches any character that is not a newline char (e.g. \n) (unless you use the s flag, explained later on)
  • [^] matches any character, including newline characters. It’s useful on multiline strings
  • \b matches a set of characters at the beginning or end of a word
  • \B matches a set of characters not at the beginning or end of a word

Regular expression choices (or)

If you want to search one string or another, use the | operator:

/foo|bar/.test('foo')  //✅
/foo|bar/.test('bar')  //✅
Enter fullscreen mode Exit fullscreen mode

Quantifiers

Quantifiers are special operators, here are some of them:

?: optional quantifier
Imagine you need to find if a string contains one digit in it, just the one, you can do something like:

/^\d$/.test('1')  //✅
/^\d$/.test('a')  //❌
/^\d$/.test('')   //❌

/^\d?$/.test('')  //✅
Enter fullscreen mode Exit fullscreen mode

+:  1 ore more
Matches one or more (>=1) items:

/^\d+$/.test('12')      //✅
/^\d+$/.test('14')      //✅
/^\d+$/.test('144343')  //✅
/^\d+$/.test('')        //❌
/^\d+$/.test('1a')      //❌
Enter fullscreen mode Exit fullscreen mode

*:  0 ore more
Matches cero or more (>=0) items:

/^\d*$/.test('12')      //✅
/^\d*$/.test('14')      //✅
/^\d*$/.test('144343')  //✅
/^\d*$/.test('')        //✅
/^\d*$/.test('1a')      //❌
Enter fullscreen mode Exit fullscreen mode

{n}: fixed number of matches
Matches exactly n items:

^\d{3}$/.test('123')            //✅
/^\d{3}$/.test('12')            //❌
/^\d{3}$/.test('1234')          //❌

/^[A-Za-z0-9]{3}$/.test('Abc')  //✅
Enter fullscreen mode Exit fullscreen mode

{n, m}: n to m number of matches
Matches between n and m times:

/^\d{3,5}$/.test('123')    //✅
/^\d{3,5}$/.test('1234')   //✅
/^\d{3,5}$/.test('12345')  //✅
/^\d{3,5}$/.test('123456') //❌
Enter fullscreen mode Exit fullscreen mode

m can also be omitted, in that case, it will match at least n items:

/^\d{3,}$/.test('12')         //❌
/^\d{3,}$/.test('123')        //✅
/^\d{3,}$/.test('12345')      //✅
/^\d{3,}$/.test('123456789')  //✅
Enter fullscreen mode Exit fullscreen mode

Escaping

As we saw already, there are certain characters which have a special meaning, but what if we want to match by one of those characters? it is possible to escape special characters with \, let's see an example:

/^\^$/.test('^')  //✅
/^\$$/.test('$')  //✅
Enter fullscreen mode Exit fullscreen mode

Groups

Using parentheses, you can create groups of characters: (...):

/^(\d{3})(\w+)$/.test('123')           //❌
/^(\d{3})(\w+)$/.test('123s')          //✅
/^(\d{3})(\w+)$/.test('123something')  //✅
/^(\d{3})(\w+)$/.test('1234')          //✅
Enter fullscreen mode Exit fullscreen mode

You can also use the qualifiers (like the repetition or the optional qualifier) for a group:

/^(\d{2})+$/.test('12')   //✅
/^(\d{2})+$/.test('123')  //❌
/^(\d{2})+$/.test('1234') //✅
Enter fullscreen mode Exit fullscreen mode

Groups are also very interesting, as when used with functions like match() and exec() as we saw before, they can be captured separately:

Example with exec():

const str = 'table football, foosball'
const regex = /foo/g

let result;
while ((result = regex.exec(str)) !== null) {
  console.log(`Found ${result[0]} at ${result.index}.`);
}

-------------
Output
-------------
Found foo at 6.
Found foo at 16.
Enter fullscreen mode Exit fullscreen mode

Example with match():

const paragraph = 'The quick brown fox jumps over the lazy dog. It barked.'
const regex = /[A-Z]/g
const found = paragraph.match(regex)
console.log(found)

-------------
Output
-------------
Array ["T", "I"]
Enter fullscreen mode Exit fullscreen mode

Named capture groups

With ES2018 it is now possible to assign names to groups, so that working with the results is much easier, take a look at the following example without naming groups:

const re = /(\d{4})-(\d{2})-(\d{2})/
const result = re.exec('2015-01-02')
console.log(result)

-------------
Output
-------------
["2015-01-02", "2015", "01", "02", index: 0, input: "2015-01-02", groups: undefined]
Enter fullscreen mode Exit fullscreen mode

Now using named groups:

const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
const result = re.exec('2015-01-02')
console.log(result)

-------------
Output
-------------
(4) ["2015-01-02", "2015", "01", "02", index: 0, input: "2015-01-02", groups: {}]
    0: "2015-01-02"
    1: "2015"
    2: "01"
    3: "02"
    groups: {year: "2015", month: "01", day: "02"}
    index: 0
    input: "2015-01-02"
    length: 4
Enter fullscreen mode Exit fullscreen mode

Now inside the groups result, we can easily capture each one of them.

Flags

As we saw in the constructor example, and we used it through out the article, regular expressions have some flags which change the behavior for the matches:

  • g: matches the pattern multiple times
  • i: makes the regex case insensitive
  • m: enables multiline mode. In this mode, ^ and $ match the start and end of the whole string. Without this, with multiline strings they match the beginning and end of each line.
  • u: enables support for unicode (introduced in ES6/ES2015)
  • s: short for single line, it causes the . to match new line characters as well

Flags can be combined, and in the case of regex literals they are set at the end of the regex:

/hello/ig.test('Hello') //✅
Enter fullscreen mode Exit fullscreen mode

Or using the constructor as a second parameter of the function:

new RegExp('hello', 'ig').test('Hello') //✅
Enter fullscreen mode Exit fullscreen mode

That was a lot, enough with that, let's see some cool examples.


Cool Examples

Password Strength

^(?=.*[A-Z].*[A-Z])(?=.*[!@#$&*])(?=.*[0-9].*[0-9])(?=.*[a-z].*[a-z].*[a-z]).{8}$
Enter fullscreen mode Exit fullscreen mode

Checks a password's strength, useful if you want to build your own password validator. I know this is subjective, as different services may have different needs, but it's a great place to start

Validate email address

/[A-Z0-9._%+-]+@[A-Z0-9-]+.+.[A-Z]{2,4}/igm
Enter fullscreen mode Exit fullscreen mode

Probably one of the most famous cases for regular expressions, validating email addresses

IP Addresses

V4:

/\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b/
Enter fullscreen mode Exit fullscreen mode

V6:

(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
Enter fullscreen mode Exit fullscreen mode

Pull domain from URL

/https?:\/\/(?:[-\w]+\.)?([-\w]+)\.\w+(?:\.\w+)?\/?.*/i
Enter fullscreen mode Exit fullscreen mode

Example of use:

'https://livecodestream.com/'.match(/https?:\/\/(?:[-\w]+\.)?([-\w]+)\.\w+(?:\.\w+)?\/?.*/i)

-------------
Output
-------------
(2) ["https://livecodestream.com/", "livecodestream", index: 0, input: "https://livecodestream.com/", groups: undefined]
Enter fullscreen mode Exit fullscreen mode

Pull image source

/< *[img][^\>]*[src] *= *[\"\']{0,1}([^\"\'\ >]*)/
Enter fullscreen mode Exit fullscreen mode

Example of use:

'<img src="https://livecodestream.com/featured.jpg" />'.match(/< *[img][^\>]*[src] *= *[\"\']{0,1}([^\"\'\ >]*)/)

-------------
Output
-------------
(2) ["<img src="https://livecodestream.com/featured.jpg", "https://livecodestream.com/featured.jpg", index: 0, input: "<img src="https://livecodestream.com/" />", groups: undefined]
Enter fullscreen mode Exit fullscreen mode

Credit Card Numbers

^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$
Enter fullscreen mode Exit fullscreen mode

Conclusion

Regular expressions are a very powerful feature, that can intimidate at first but once you get the hang of it they are pretty cool. Today we learn what they are, how to use them, how to build them and some cool examples, I hope that the next time you see one of them in your projects you don't run away (like I did), and you try to understand it and work with it.

Thanks so much for reading!


If you like the story, please don't forget to subscribe to our free newsletter so we can stay connected: https://livecodestream.dev/subscribe

💖 💪 🙅 🚩
bajcmartinez
Juan Cruz Martinez

Posted on July 2, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

The Accessor Protocol
javascript The Accessor Protocol

August 6, 2024

The Renaissance of Meteor.js
javascript The Renaissance of Meteor.js

July 26, 2024

Eager loading vs lazy loading
javascript Eager loading vs lazy loading

December 18, 2023

Understanding require function (Node.js)
javascript Understanding require function (Node.js)

October 3, 2023