Using RegEx Capturing Groups
Swarnali Roy
Posted on August 27, 2021
Hello Readers!
I always love to work with Regular Expressions and a very important concept of RegEx is "RegEx Capturing Groups".
Sometimes in a string, patterns we search for may occur multiple times. It is wasteful to manually repeat that regex. A better way to specify when we have multiple repeated substrings is using "RegEx Capturing Groups"
š Parentheses ( ), are used to find repeated substrings. We just need to put the regex that will repeat in between the parentheses.
š It allows to get a part of the match as a separate item in the result array.
š If we put a quantifier after the parentheses, it applies to the parentheses as a whole.
Let's see an example:
let regex = /(go)+/ig;
let str = 'goGoGOgOgoooogo';
let result = str.match(regex);
console.log(result);
//output: [ 'goGoGOgOgo', 'go' ]
Parentheses ( ), group characters together, so (go)+ means "go", "gogo", "gogogo" and so on. But once it found another o after "go", it stopped matching the regex. Then again, when it found another "go" , it returned a second match.
Specify RegEx Capturing Groups using Numbers
Let's say, we need to write a pattern which will repeat more than once in a string. In that case, we can easily specify it without writing the pattern again and again.
To specify where that repeat string will appear, you need to use a backslash () and then a number. This number starts at 1 and increases with each additional capture group you want to use.
šø Example: The code block will match a string that consists of only the same number repeated exactly three times separated by single spaces.
let repeatNum = "93 93 93";
let wrongNum = "100 100 200 100";
let regex = /^(\d+)\s\1\s\1$/;
let result = regex.test(repeatNum);
console.log(result); //true
result = repeatNum.match(regex);
console.log(result); // [ '93 93 93' ]
let wrongRes = regex.test(wrongNum);
console.log(wrongRes); //false
wrongRes = wrongNum.match(regex);
console.log(wrongRes); //null
/^(\d+)\s\1\s\1$/
this regex explains:
(i) a caret ( ^ )
is at the beginning of the entire regular expression, it matches the beginning of a line.
(ii) (\d+)
is the first capturing group that finds any digit from 0-9 appears at least one or more times in the string.
(iii) \s
finds a single white space
(iv) \1
represents the first capturing group which is (\d+)
.
(v) a dollar sign ( $ ) is at the end of the entire regular expression, it matches the end of a line.
š¹ The first capturing group is repeated with * \1 * and separated by white space. The output will match any three same numbers like "100 100 100", "93 93 93" but will not match more than 3 times!
š¹ regex.test(repeatNum)
returns true and matches "93 93 93" but regex.test(wrongNum)
returns false and as it doesn't find any match, .match() returns null.
RegEx Capturing Groups to Search and Replace Text in a String using string.replace()
In regular expressions, we can make searching even more powerful when it also replaces the text we need to match.
string.replace()
is the desired method to search and replace any pattern in that string.
š It requires two parameters.
š First is the regex pattern we want to search for.
š Second is the string to replace the match or a function to do something.
š Capturing groups can be accessed in the replacement string with dollar signs ($).
šø Example: The "str" string includes three different words. We need to capture three different groups. A regex "fixRegex" is used for capturing three capture groups that will search for each word in the string. Then update the "replaceText" variable to replace the string "one two three" with the string "three two one" and assign the result to the result variable.
let str = "one two three";
let fixRegex = /^(\w+)\s(\w+)\s(\w+)$/;
let replaceText = "$3 $2 $1";
let result = str.replace(fixRegex, replaceText);
console.log(result); //three two one
The regex /^(\w+)\s(\w+)\s(\w+)$/
explains:
(i) a caret ( ^ ) is at the beginning of the entire regular expression, it matches the beginning of a line.
(ii) \w
means Find a word character including _
(iii) +
after \w
means find a word character including _ that matches any string that contains at least one word character.
(iv) \s
means Find a white space character.
(v) a dollar sign ( $ ) is at the end of the entire regular expression, it matches the end of a line.
As there are three different capturing groups, so \1 after \s will not work as it will repeat the word "one". We need to write (\w+) three times.
š¹ The "replaceText" replaced the 1st capturing group with the 3rd capturing group which is simply done with the dollar sign ($).
š¹ $3 captures the 3rd group which is the word "three" and replaces it with the 1st group 1 which is "one" using $1.
š¹ The 2nd group remains as it is denoted by $2.
š¹ The string.replace()
method took the fixRegex as first parameter and replaceText as the second parameter and simply returned "three two one" replacing one and three with each other.
If you find it interesting, write a regex to solve this problem. Replace the following string with "five three one 6 4 2" using RegEx Capturing Groups and string.replace method.
let str = "one 2 three 4 five 6";
//output: five three one 6 4 2
Questions are always welcomed in the discussion section!!
Posted on August 27, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.