Regex Named Groups and Backreferences

amrdeveloper

Amr Hesham

Posted on November 29, 2021

Regex Named Groups and Backreferences

Hi, I am Amr Hesham a Software Engineer, I am interested in Android Development and Compiler Design, In this article, I will talk about very good and useful feature which is Regex Named group and Backreferences with examples,

Regex Named group and Backreferences introduced first time in Python re module then Microsoft developers supported it in .NET with different syntax, and Java supported it from JDK 7, Now it supported in most of the modern programming languages like Ruby, PHP, R …etc,

This feature helps you to group your regular expressions with name and reference to those groups later in the regex, to learn more about the history of this feature and more details I recommend checking regular-expressions.info tutorials written by Jan Goyvaerts.

Let’s start with examples about how to define a named group and get group by name and by index, I will use Kotlin programming language, the concepts are the same, bug as i said before but some non JVM languages have different syntax for the same feature.

Suppose we want to parse color attributes in our Android project color.xml file and print each color name and value, for example here we have 3 colors.

<color name="black">#000000</color>
<color name="white">#ffffff</color>
<color name="grey">#cccccc</color>
Enter fullscreen mode Exit fullscreen mode

And we want to print

black #000000
white #ffffff
grey #cccccc
Enter fullscreen mode Exit fullscreen mode

We can do this task using many different techniques, but I will show you how to do it using regex named groups easily.

First, we need to create a regex that matches each attribute normally, each attribute contains type, name and value like this

value

So the normal regex will be

<\\w+ name=\"\\w+\">.+</\\w+>
Enter fullscreen mode Exit fullscreen mode

You can use Regex101.com to test your regex easily and understand it, but make sure you use selected Java 8 flavor.

Now after we created our regex we need to group the information that we need to get them,

What we need is attribute name and value, so just put their regex inside ( ) like this

<\\w+ name=\"(\\w+)\">(.+)</\\w+>
Enter fullscreen mode Exit fullscreen mode

Now attribute name will be in group number 1 because group number 0 contains the full text which is matched by our full regex and attribute value on group number 2,

To get information first, We will compile this pattern

val pattern = Pattern.compile(attributePattern)
Enter fullscreen mode Exit fullscreen mode

Then we will get every substring that matches our pattern, and get the 2 groups 1 and 2

val matcher = pattern.matcher(text) 
while (matcher.find()) {
    val attributeName = matcher.group(1)
    val attributeValue = matcher.group(2)
    println("$attributeName $attributeValue")
}
Enter fullscreen mode Exit fullscreen mode

That’s it!! and the output will be exactly what we want.

To use grouping by name all you need is to add a name for each group, just add ? inside your group for example.

<\\w+ name=\"(?<KEY>\\w+)\">(?<VALUE>.+)</\\w+>
Enter fullscreen mode Exit fullscreen mode

Group name must be an alphanumeric sequence starting with a letter and you can’t name two groups with the same name.

Now instead of getting group by index like 1, 2 we will use KEY and VALUE,

while (matcher.find()) {
    val attributeName = matcher.group("KEY")
    val attributeValue = matcher.group("VALUE")
    println("$attributeName $attributeValue")
}
Enter fullscreen mode Exit fullscreen mode

And you will get the same output :D.

Now after we learned about what is named groups and how we can use it, it’s time for Backreferences.

Basically Backreferences used to match the same text as previously matched by a group, for example suppose we want to check if a number contain only one repeated digit like 1, 22, 333, 444 so how we can do this using Regex,

To use Backreferences first we need to define a group and our group in this case will be one digit (?\d), so this will match the first digit right, then we will use Backreferences to check if all other digits are the same as the matched text for our first one, to do this you can use ‘\k’ or by index like ‘\1’.

Our full regex will be “(?\d)\k” or “(\d)\1” this means we expect one digit with a group DIGIT and zero or more of the same digit that matched by this group, full code will be like this

fun main() {     
    val repeatedDigitRegex = "(?<DIGIT>\\d)\\k<DIGIT>*"
    val pattern = Pattern.compile(repeatedDigitRegex) 
    println(pattern.matcher("1").matches())
    println(pattern.matcher("22").matches())
    println(pattern.matcher("333").matches())
    println(pattern.matcher("4444").matches())
    println(pattern.matcher("10").matches())
    println(pattern.matcher("21").matches())
    println(pattern.matcher("101").matches())
}
Enter fullscreen mode Exit fullscreen mode

And output will be

true true true true false false false
Enter fullscreen mode Exit fullscreen mode

There are many ideas that can be created using this feature for example check if HTML start and closed by the same tag, check if anything is repeated …etc

There is another useful method called replaceAll from matcher class, it can replace the current matched substring with any string or with group matched text using references, for example in the last example if we want to replace all the repeated digits with only one of them, so we need to replace the regex with just the first group of it.

So instead of using matches we will use replaceAll and pass the group reference by a number which is $1, code will be like this.

println(pattern.matcher("2222").replaceAll("$1"))
Enter fullscreen mode Exit fullscreen mode

and printed output will be 2

I hope you enjoyed this article and if you want to learn more about this topic there are some useful resources.

Enjoy Programming 😋.

💖 💪 🙅 🚩
amrdeveloper
Amr Hesham

Posted on November 29, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related