Regular expression capture and replace examples

sparky

Brian Schroer

Posted on September 26, 2022

Regular expression capture and replace examples

Regular expressions are incredibly powerful, but they're sometimes described as "looking like cartoon characters swearing", and the syntax can be difficult to remember.

I only find myself needing to write code to "capture" and replace values from a string a couple of times a year and always have to re-learn the syntax, so I'm blogging this for my reference. I hope it will be helpful to you also.

I'm including examples in JavaScript and C#...

Here's the scenario for the following examples: We want to log XML payment account setup requests/responses, but the request contains a bank account number:

<Name>Sparky's Bank Account</Name>
<BankRoutingNumber>123456789</BankRoutingNumber>
<BankAccountNumber>987654321</BankAccountNumber>
Enter fullscreen mode Exit fullscreen mode

Bank account numbers are sensitive information, so we don't want to log them as-is.

Here's the basic Regex to find BankAccountNumber XML elements. The "\d+" (one or more digits) pattern defines the account number:

JavaScript

const regex =
    new RegExp('<BankAccountNumber>\\d+<\/BankAccountNumber>', 'g');
Enter fullscreen mode Exit fullscreen mode

C#

var regex = new Regex(
    @"<BankAccountNumber>\d+<\/BankAccountNumber>", 
    RegexOptions.Compiled);
Enter fullscreen mode Exit fullscreen mode

...and code using the Regex to find the bank account XML elements:

JavaScript

const matches = requestXml.matchAll(regex);
console.log([...matches]);
Enter fullscreen mode Exit fullscreen mode

C#

foreach (Match match in regex.Matches(requestXml))
{
    Console.WriteLine(match.Value);
}
Enter fullscreen mode Exit fullscreen mode

Results:

<BankAccountNumber>987654321</BankAccountNumber>
Enter fullscreen mode Exit fullscreen mode

Replacing

Let's replace the bank account numbers with asterisks:

JavaScript

const censored = requestXml.replace(
    regex,
    '<BankAccountNumber>*********</BankAccountNumber>');

console.log(censored);
Enter fullscreen mode Exit fullscreen mode

C#

string censored = regex.Replace(
    requestXml,
    "<BankAccountNumber>*********</BankAccountNumber>");

Console.WriteLine(censored);
Enter fullscreen mode Exit fullscreen mode

Results:

<Name>Sparky's Bank Account</Name>
<BankRoutingNumber>123456789</BankRoutingNumber>
<BankAccountNumber>*********</BankAccountNumber>
Enter fullscreen mode Exit fullscreen mode

Capturing

A capture group is defined by enclosing part of the regex ("\d+" in this example) in parentheses:

<BankAccountNumber>(\\d+)<\/BankAccountNumber>
Enter fullscreen mode Exit fullscreen mode

The entire match (the XML element in this example) is an automatic capture group, so this will result in two capture groups:

JavaScript

const matches = requestXml.matchAll(regex);
for (const match of matches) {
    const captures = [...match];
    for (var i = 0; i < captures.length; i++) {
        console.log(`[${i}] ${captures[i]}`)
    }
}
Enter fullscreen mode Exit fullscreen mode

C#

foreach (Match match in regex.Matches(requestXml))
{
    for (int i = 0; i < match.Groups.Count; i++)
    {
        Console.WriteLine($"[{i}] {match.Groups[i]}");
    }
}
Enter fullscreen mode Exit fullscreen mode

Results:

[0] <BankAccountNumber>987654321</BankAccountNumber>
[1] 987654321
Enter fullscreen mode Exit fullscreen mode

Named captures

You can name a capture group with the syntax "(?<name>pattern)". In this example, I'm using the name "acctNum":

<BankAccountNumber>(?<acctNum>\\d+)<\/BankAccountNumber>
Enter fullscreen mode Exit fullscreen mode

(The angle bracket syntax for group naming is a bit confusing for this example because it looks like XML.)

Using a named capture:

JavaScript

const matches = requestXml.matchAll(regex);
for (const match of matches) {
    console.log(match.groups.acctNum);
}
Enter fullscreen mode Exit fullscreen mode

C#

foreach (Match match in regex.Matches(requestXml))
{
    Console.WriteLine(match.Groups["acctNum"].Value);
}
Enter fullscreen mode Exit fullscreen mode

Results:

987654321
Enter fullscreen mode Exit fullscreen mode

Replacing with a function

We shouldn't log account numbers, but let's say our security policy allows logging them "masked". Let's try replacing with a "callback" function that replaces all but the last four digits with asterisks:

JavaScript

const censored = requestXml.replaceAll(regex, match => {
    const innerMatches = [...match.matchAll(regex)];
    const acctNum = innerMatches[0].groups.acctNum;
    const len = acctNum.length;
    const masked = (len > 4)
        ? '*'.repeat(len - 4) + acctNum.substr(len - 4)
    : '*'.repeat(len);
    return `<BankAccountNumber>${masked}</BankAccountNumber>`;
});

console.log(censored);
Enter fullscreen mode Exit fullscreen mode

C#

string censored = regex.Replace(requestXml, match =>
{
    string accountNumber = match.Groups["acctNum"].Value;
    int len = accountNumber.Length;
    string masked = (len > 4) 
        ? new string('*', len - 4) + accountNumber.Substring(len - 4) 
        : new string('*', len);
    return $"<BankAccountNumber>{masked}</BankAccountNumber>";
});
Enter fullscreen mode Exit fullscreen mode

Results:

<Amount>123.45</Amount>
<BankRoutingNumber>123456789</BankRoutingNumber>
<BankAccountNumber>*****4321</BankAccountNumber>
Enter fullscreen mode Exit fullscreen mode

.Net "Fiddle" containing the C# code excerpted above

JS "Fiddle" containing the JavaScript code excerpted above

💖 💪 🙅 🚩
sparky
Brian Schroer

Posted on September 26, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related