How to use Regular Expressions in JavaScript
Charlotte
Posted on December 4, 2020
"Some people, when confronted with a coding problem, think "I know, I'll use regular expressions." Now they have two problems."
Jamie Zawinski (world class hacker)
What are Regular Expressions
A regular expression (RegEx) is a string of text that allows you to create patterns that help match, locate, and manage text. To those well versed in this dark art, RegEx can be incredibly powerful, to the rest of us they can be a source of bewilderment and confusion - or so I thought. A closer look at them recently while practicing algorithm problems for interviews revealed to me they are perhaps not as tricky as I once believed and can be extremely useful. While the subject is extensive and cannot possibly be covered in one article I wish to share a few key things that really opened my eyes to how powerful RegEx can be.
Testing for a match on a string
What if you needed to know if a string has a particular word in it? You could just do the below:
const string = 'The cat sat on the mat'
const regex = /sat/
regex.test(string)
// result: true
This basically 'tests' the string to see if the word 'sat' exists in the string.
The / / in line 2 identifies to JavaScript that the characters in between are part of the regular expression. The RegEx variable can just then be combined with the test( ) method to check the string.
As the result is just a returned boolean (true or false), it can be easily combined with an if/else statement or ternary operator to continue with further actions depending on whether the string is present or not.
Used with an if/else statement:
const string = 'The cat sat on the mat'
const regex = /sat/
if (regex.test(string)) {
'The word sat can be found in the string'
} else {
'The word sat is not in the string'
}
// result: 'The word sat can be found in the string'
Used with a ternary operator:
const string = 'The cat sat on the mat'
const regex = /sat/
const result = regex.test(string) ? 'The word sat can be found in the string' : 'The word sat is not in the string'
// result: 'The word sat can be found in the string'
To further enhance this, the RegEx can include 'i' at the end of the expression like so:
/sat/i
This will make the test case insensitive so will ensure the match is true whether the word to be found has a capital letter or not.
Return the actual matched characters not just true or false
What if you need to capture the match itself for further use rather than just confirming whether the match is there or not?
This can be achieved using the match( ) method. Note the syntax order is slightly different here (RegEx inside the parenthesis).
const string = '989hjk976'
const regex = /[a-z]/gi
console.log(string.match(regex))
// result: [h, j, k]
The [ ] specifies a character range (in this case any lower case letters from a-z); anything within this range will be a match. You could search for numbers instead using [0-9] or capitals using [A-Z] . You can also shorthand this by using '\w' (without quotes) which matches any word character equal to '[a-zA-Z0-9_]' (note the included underscore).
The 'g' stands for global which means, 'show all the matches, not just the first one' (the RegEx reads from left to right when searching and will stop on the first positive match unless you specify otherwise).
There are other flags and switches you can use if you need to be more specific:
The '+'
const string = 'abc123DEF'
const regex = /[a-zA-Z]+/g
console.log(string.match(regex))
// result: ['abc', DEF]
//Note the + which means match 1 or more in a row
The '.'
const string = 'abc123DEF'
const regex = /[a-z]./g
console.log(string.match(regex))
// result: ['ab', 'c1']
// The '.' means, 'include 1 more (of any character) after the matched string
The '^'
You can also choose to NOT match something using the '^' but be careful WHERE you use it.
const onlyReturnIfConsonant = (str) => {
const regex = /^[^aeiou]/
const result = str.match(regex)
console.log(result)
}
// onlyReturnIfConsonant("bananas"); // result: ['b']
// onlyReturnIfConsonant("email"); // result: null
The '^' OUTSIDE the [ ] means only check the START of the string.
The '^' INSIDE the [ ] means match any character NOT in the specified set. So here only words starting with a consonant will return a result.
The order can be important so accuracy is required when constructing the RegEx.
There are many other flags and switches and these can often be used in combination with each other (when it makes logical sense to) but these give an example of some of the things that are possible. A great resource covering more of the specifics regarding RegEx and match( ) can be found here.
Formatting in place using regEx and split( )
What if, instead of just capturing the match you wanted that match to perform an action at the same time. One possible scenario concerns the use of the split( ) method. This method divides a string into an ordered list of substrings and returns them in an array. This can be very useful but how do you describe how you want the string to be separated? This is where RegEx is really helpful. An example below shows a potential use case inside a function:
const separateAString = (str) => {
return str.split(/\s+|\_+|(?=[A-Z])/).join(' ')
}
separateAString('TheCat_Sat onTheMat');
// result: ['The', 'Cat', 'Sat', 'On', 'The', 'Mat'] (before join())
// result: 'The Cat Sat On The Mat' (after join(" "), with spaces now included)
As you can see, the RegEx has made this possible but what on earth does it mean?
/\s+|\_+|(?=[A-Z])/
The \s looks for any whitespace characters (the + means 1 or more).
The _ looks for any underscores, this is an example of an escaped character, where the character is defined literally (as it is) rather than some special programmatic meaning i.e. if 's' is used it is treated as an actual 's', if an '\s' is used it is treated as a whitespace character. It is not completely necessary here (an underscore doesn't need to be escaped in JavaScript) but it has been used here just to give an example. A '+' is also included here to capture 1 or more occurrences of where the underscore may be found.
The '( )' means a capture group, this is a way to treat multiple characters as a single unit.
The '?=[A-Z]' inside the '( )' is an example of a positive lookahead which, in this case, means: 'split a string just before any capital letter'.
The | means 'or' in RegEx and is demonstrated here separating the 3 parts of the expression so: 'split wherever there is a whitespace or an underscore or just before a capital letter'. This ability to chain together different parts of an expression highlights one of the reasons RegEx can be so powerful.
The join( ) method then completes the process by converting the array back into a string. The ' ' (as a specified argument in join( )) makes sure a space is added to each location in the string as specified by the split( ) method before it.
Amending in place using regex and replace( )
As a final example, what if you wanted to find something in a string and replace what you've found with something else in a single step? This can be achieved with the replace( ) method.
Here is a basic example of replace( ) used inside a function:
const replaceExample = (str) => {
return str.replace('Test', 'Game')
}
replaceExample('This is a Test');
// result: 'This is a Game'
The method takes two arguments, the first is the part of the passed in string to be replaced, the second is what to replace the string with.
The first argument can be a string or a regular expression. If a string is used (as per the example above) only the first occurance be be replaced so already RegEx can prove it's value here (remember the 'g' flag).
The example below shows a regex example with replace( ):
const separateStrings = (str) => {
return str.replace(/([a-z])([A-Z])/g, '$1 $2')
}
separateStrings('AnotherStringToSeparate');
// result: 'Another String To Separate'
This demonstrates a new technique.
This example includes two capture groups, remember the '( )' from a previous example? The first contains a range of lowercase letters [a-z]. The second contains a range of uppercase letters [A-Z].
The second parameter '$1 $2' is a direct reference to these capture groups. $1 refers to the first capture group ([a-z]), $2 refers to the second capture group ([A-Z]). By taking these together in quotes and putting a space between them like so: '$1 $2' you are saying 'wherever a lowercase letter is next to an uppercase letter put a space between them'. If you do the following: '$1-$2' the string will contain a '-' between each word like this: 'Another-String-To-Separate'. This is quite a dynamic feature and could enable any number of possibilities depending on how you structure your code and RegEx. When I found this out I thought it was pretty cool!
Adding spaces or characters isn't the only thing you can do either, the example below shows how you can define two capture groups then switch them round as if you were shuffling a pack of cards:
const shuffleAWord = (str) => {
return str.replace(/(^[^aeiou]+)(\w*)/, '$2$1');
}
shuffleAWord("grain");
// result: 'aingr'
// in this case '$1' is 'gr', '2' is 'ain'
The first capture group '(^[^aeiou]+)' gathers all the consonants from the beginning of the word and stops when it gets to a vowel. In this example this returns as 'gr'.
The second capture group gathers up all alphanumeric characters (\w*) not picked up in the first group. The '*' means 'match 0 or more of the characters referred to before it'. In this example, this returns as 'ain'.
The second parameter in the replace method again shows a reference to the capture groups '$1 and $2' but this time they have been switched around and joined together '$2$1'. This then results in the following: 'aingr'.
Conclusion
The above examples are deliberately contrived but their purpose is to show how configurable and flexible RegEx can be when used with the methods JavaScript provides. There are many other examples but this is just a sample of those I recently found useful.
In conclusion to this article there are just some final points worth mentioning.
- Despite it's power and usefulness, it is advised not to overuse RegEx because it can make your code difficult to read
- If a RegEx string looks like it has the power to confuse, make sure to add some comments to help clarify what it's doing
- Keep it as simple and as readable as possible
- Constructing RegEx can be tricky but there are some really useful tools out there such as this one and this one which can make the process much easier
I hope this article is helpful, if anyone would like to share any tips or tricks they have found particularly helpful in the world of RegEx please share them in the comments.
Thanks for reading
Posted on December 4, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.