Async Replace All for JavaScript

keestalkstech

Kees C. Bakker

Posted on April 25, 2023

Async Replace All for JavaScript

I love the replaceAll string API in JavaScript, as it makes replacing a string far more intuitive than the "good old" global regular expression. This week I had to replace strings with the results of async calls. Yeah, that is not supported by any API in standard JavaScript.

The idea

So, let's create a replaceAllAsync that looks like this:

export async function replaceAllAsync(
  input: string,
  regex: RegExp,
  replacement: (match: RegExpMatchArray) => Promise<string>,
  mode: PromiseExecutionMode = PromiseExecutionMode.All
): Promise<string> {
  // implementation
}

export enum PromiseExecutionMode {
  // Uses Promise.All -- which is the fastest, but may overwhelm
  All,
  // Uses an await on each replacement, making it synchronous
  ForEach,
}
Enter fullscreen mode Exit fullscreen mode

The input is the input string that we'll work on. The regex is the regular expression that will be used to find matches that need to be replaced. Every match is fed to the replacement function, which will be awaited. The mode indicates if we should process all the promises at once or one by one.

In the body of the function, we need to:

  1. Capture all the matches and feed them to the replacement function.
  2. Split the input on the regular expression.
  3. Stitch all the components back together into a string.
  4. Return that string.

Easy, right? Well... it turns out there are a view caveats when it comes to regular expressions.

Caveats of reusing a global regex

Consider the following code:

const str = "Numb3r!1"
const regex = /\d+/g

console.log("Test 1", regex.test(str)) // true
console.log("Test 2", regex.test(str)) // true
console.log("Test 3", regex.test(str)) // false
console.log("Test 4", regex.test(str)) // true
Enter fullscreen mode Exit fullscreen mode

Why? It turns out the RegExp oject is very stateful when the global flag is set:

JavaScript RegExp objects are stateful when they have the global or sticky flags set (e.g., /foo/g or /foo/y). They store a lastIndex from the previous match. Using this internally, test() can be used to iterate over multiple matches in a string of text (with capture groups).

Source: MDN Web Docs - RegExp.prototype.test()

The lastIndex is changed per test! So when you reuse a global regex, you might not get what you think. We'll create a new regular expression based on the given one.

Caveats of split with capture groups

Consider the following code:

const str = "I have 12 bananas and 3 apples!"

// no groups:
const regex1 = /\d+/g
console.log("Test 1", str.split(regex1))
// [ 'I have ', ' bananas and ', ' apples!' ]

// capture groups:
const regex2 = /(\d+)/g
console.log("Test 2", str.split(regex2))
// [ 'I have ', '12', ' bananas and ', '3', ' apples!' ]

// non-capture groups:
const regex3 = /(?:\d+)/g
console.log("Test 3", str.split(regex3))
// [ 'I have ', ' bananas and ', ' apples!' ]
Enter fullscreen mode Exit fullscreen mode

Conclusion: if you use capture groups in your regular expression, they will end up in the splitted result.

Caveats of matchAll

So, when you want to capture all the matches of a string you can do a str.matchAll(regex). But this only work for global regular expressions.

Implications

Based on the caveats, we have to do a view things:

  • It is better not to reuse the given regular expression, but to create a new one from it. This prevents problems with lastIndex.
  • Convert the regular expression to a global one. The end user should already know, but this might make it a bit easier, and it prevents us from having to debug all the time.
  • Use 2 regular expressions: one for matching and one for splitting. The regular expression used for splitting should have its capturing groups replaced by non-capturing groups.

The code

Now, let's implement the function using what we know:

export async function replaceAllAsync(
  input: string,
  regex: RegExp,
  replacement: (match: RegExpMatchArray) => Promise<string>,
  mode: PromiseExecutionMode = PromiseExecutionMode.All
): Promise<string> {
  // replace all implies global, so append if it is missing
  const addGlobal = !regex.flags.includes("g")
  let flags = regex.flags
  if (addGlobal) flags += "g"

  // get matches
  let matcher = new RegExp(regex.source, flags)
  const matches = Array.from(input.matchAll(matcher))

  if (matches.length == 0) return input

  // construct all replacements
  let replacements: Array<string>
  if (mode == PromiseExecutionMode.All) {
    replacements = await Promise.all(matches.map(match => replacement(match)))
  } else if (mode == PromiseExecutionMode.ForEach) {
    replacements = new Array<string>()
    for (let m of matches) {
      let r = await replacement(m)
      replacements.push(r)
    }
  }

  // change capturing groups into non-capturing groups for split
  // (because capturing groups are added to the parts array
  let source = regex.source.replace(/(?<!\\)\((?!\?:)/g, "(?:")
  let splitter = new RegExp(source, flags)

  const parts = input.split(splitter)

  // stitch everything back together
  let result = parts[0]
  for (let i = 0; i < replacements.length; i++) {
    result += replacements[i] + parts[i + 1]
  }

  return result
}

export enum PromiseExecutionMode {
  // Uses Promise.All -- which is the fastest, but may overwhelm
  All,
  // Uses an await on each replacement, making it synchronous
  ForEach,
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

JavaScript is not always as intuitive as I would like it to be. But with regular expressions, you can build some pretty powerful async replacements.

💖 💪 🙅 🚩
keestalkstech
Kees C. Bakker

Posted on April 25, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Async Replace All for JavaScript
javascript Async Replace All for JavaScript

April 25, 2023

NestJS:  Could be amazing someday
javascript NestJS: Could be amazing someday

June 1, 2021