Regular expression capture and replace examples
Brian Schroer
Posted on September 26, 2022
Regular expressions are incredibly powerful, but they're sometimes described as "looking like cartoon characters swearing", and the syntax can be difficult to remember.
I only find myself needing to write code to "capture" and replace values from a string a couple of times a year and always have to re-learn the syntax, so I'm blogging this for my reference. I hope it will be helpful to you also.
I'm including examples in JavaScript and C#...
Here's the scenario for the following examples: We want to log XML payment account setup requests/responses, but the request contains a bank account number:
<Name>Sparky's Bank Account</Name>
<BankRoutingNumber>123456789</BankRoutingNumber>
<BankAccountNumber>987654321</BankAccountNumber>
Bank account numbers are sensitive information, so we don't want to log them as-is.
Here's the basic Regex to find BankAccountNumber XML elements. The "\d+" (one or more digits) pattern defines the account number:
JavaScript
const regex =
new RegExp('<BankAccountNumber>\\d+<\/BankAccountNumber>', 'g');
C#
var regex = new Regex(
@"<BankAccountNumber>\d+<\/BankAccountNumber>",
RegexOptions.Compiled);
...and code using the Regex to find the bank account XML elements:
JavaScript
const matches = requestXml.matchAll(regex);
console.log([...matches]);
C#
foreach (Match match in regex.Matches(requestXml))
{
Console.WriteLine(match.Value);
}
Results:
<BankAccountNumber>987654321</BankAccountNumber>
Replacing
Let's replace the bank account numbers with asterisks:
JavaScript
const censored = requestXml.replace(
regex,
'<BankAccountNumber>*********</BankAccountNumber>');
console.log(censored);
C#
string censored = regex.Replace(
requestXml,
"<BankAccountNumber>*********</BankAccountNumber>");
Console.WriteLine(censored);
Results:
<Name>Sparky's Bank Account</Name>
<BankRoutingNumber>123456789</BankRoutingNumber>
<BankAccountNumber>*********</BankAccountNumber>
Capturing
A capture group is defined by enclosing part of the regex ("\d+" in this example) in parentheses:
<BankAccountNumber>(\\d+)<\/BankAccountNumber>
The entire match (the XML element in this example) is an automatic capture group, so this will result in two capture groups:
JavaScript
const matches = requestXml.matchAll(regex);
for (const match of matches) {
const captures = [...match];
for (var i = 0; i < captures.length; i++) {
console.log(`[${i}] ${captures[i]}`)
}
}
C#
foreach (Match match in regex.Matches(requestXml))
{
for (int i = 0; i < match.Groups.Count; i++)
{
Console.WriteLine($"[{i}] {match.Groups[i]}");
}
}
Results:
[0] <BankAccountNumber>987654321</BankAccountNumber>
[1] 987654321
Named captures
You can name a capture group with the syntax "(?<name>pattern)". In this example, I'm using the name "acctNum":
<BankAccountNumber>(?<acctNum>\\d+)<\/BankAccountNumber>
(The angle bracket syntax for group naming is a bit confusing for this example because it looks like XML.)
Using a named capture:
JavaScript
const matches = requestXml.matchAll(regex);
for (const match of matches) {
console.log(match.groups.acctNum);
}
C#
foreach (Match match in regex.Matches(requestXml))
{
Console.WriteLine(match.Groups["acctNum"].Value);
}
Results:
987654321
Replacing with a function
We shouldn't log account numbers, but let's say our security policy allows logging them "masked". Let's try replacing with a "callback" function that replaces all but the last four digits with asterisks:
JavaScript
const censored = requestXml.replaceAll(regex, match => {
const innerMatches = [...match.matchAll(regex)];
const acctNum = innerMatches[0].groups.acctNum;
const len = acctNum.length;
const masked = (len > 4)
? '*'.repeat(len - 4) + acctNum.substr(len - 4)
: '*'.repeat(len);
return `<BankAccountNumber>${masked}</BankAccountNumber>`;
});
console.log(censored);
C#
string censored = regex.Replace(requestXml, match =>
{
string accountNumber = match.Groups["acctNum"].Value;
int len = accountNumber.Length;
string masked = (len > 4)
? new string('*', len - 4) + accountNumber.Substring(len - 4)
: new string('*', len);
return $"<BankAccountNumber>{masked}</BankAccountNumber>";
});
Results:
<Amount>123.45</Amount>
<BankRoutingNumber>123456789</BankRoutingNumber>
<BankAccountNumber>*****4321</BankAccountNumber>
Posted on September 26, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.