Search and replace tricks with ripgrep
Sundeep
Posted on September 17, 2020
ripgrep (command name rg
) is a grep
tool, but supports search and replace as well. rg
is far from a like-for-like alternate for sed
, but it has nifty features like multiline replacement, fixed string matching, PCRE2
support, etc. This post gives an overview of syntax for substitution and highlights some of the cases where rg
is a handy replacement for sed
.
Global search and replace
$ cat ip.txt
dark blue, light blue
light orange
blue sky
# by default, line number is displayed if output destination is stdout
# by default, only lines that matched the given pattern is displayed
# 'blue' is search pattern and -r 'red' is replacement string
$ rg 'blue' -r 'red' ip.txt
1:dark red, light red
3:red sky
# --passthru option is useful to print all lines, whether or not it matched
# -N will disable line number prefix
# this command is similar to: sed 's/blue/red/g' ip.txt
$ rg --passthru -N 'blue' -r 'red' ip.txt
dark red, light red
light orange
red sky
Matching Nth occurrence
As seen in previous example, rg
will search and replace all occurrences. So, you'll have to be creative with regexp to replace only a specific occurrence per input line.
$ s='see bat hot at but at go gate at sat at but at'
# replace first occurrence only
# same as: sed 's/\bat\b/[xyz]/'
$ echo "$s" | rg --passthru -N '\bat\b(.*)' -r '[xyz]$1'
see bat hot [xyz] but at go gate at sat at but at
# same as: sed 's/\bat\b/[xyz]/3'
# the number within {} is N-1 to replace Nth occurrence, for N>1
$ echo "$s" | rg --passthru -N '^((.*?\bat\b){2}.*?)\bat\b' -r '$1[xyz]'
see bat hot at but at go gate [xyz] sat at but at
# replace last but Nth occurrence, for N>=0
$ echo "$s" | rg --passthru -N '^(.*)\bat\b((.*\bat\b){3})' -r '$1[xyz]$2'
see bat hot at but [xyz] go gate at sat at but at
In-place workaround
rg
doesn't support in-place option, so you'll have to do it yourself.
# -N isn't needed here as output destination is a file
# same as: sed -i 's/blue/red/g' ip.txt
$ rg --passthru 'blue' -r 'red' ip.txt > tmp.txt && mv tmp.txt ip.txt
$ cat ip.txt
dark red, light red
light orange
red sky
If you have moreutils installed, then you could use sponge
as well.
rg --passthru 'blue' -r 'red' ip.txt | sponge ip.txt
Rust regex and PCRE2
By default, rg
uses Rust regular expressions, which is much more featured compared to GNU sed
. The main feature not supported is backreference within regexp definition (for performance reasons). See Rust regex documentation for regular expression syntax and features. rg
supports Unicode by default.
# non-greedy quantifier is supported
$ s='food land bark sand band cue combat'
$ echo "$s" | rg --passthru 'foo.*?ba' -r '[xyz]'
[xyz]rk sand band cue combat
# unicode support
$ echo 'fox:αλεπού,eagle:αετός' | rg --passthru '\p{L}+' -r '($0)'
(fox):(αλεπού),(eagle):(αετός)
# set operator example, remove all punctuation characters except . ! and ?
$ para='"hi", there! how *are* you? all fine here.'
$ echo "$para" | rg --passthru '[[:punct:]--[.!?]]+' -r ''
hi there! how are you? all fine here.
The -P
switch will enable PCRE2 flavor, which has even more tricks. You can also use --engine=auto
to allow rg
to automatically use PCRE2
when needed (for example: useful as an alias for rg
command so that it gives performance of Rust engine by default and use PCRE2
only when needed).
# backreference within regexp definition
$ s='cocoa appleseed tool speechless'
$ echo "$s" | rg --passthru -wP '([a-z]*([a-z])\2[a-z]*){2}' -r '{$0}'
cocoa {appleseed} tool {speechless}
# replace all whole words except 'imp' and 'ant'
$ s='tiger imp goat eagle ant important'
$ echo "$s" | rg --passthru -P '\b(imp|ant)\b(*SKIP)(*F)|\w+' -r '[$0]'
[tiger] imp [goat] [eagle] ant [important]
# recursively match parentheses
$ eqn='(3+a)x * y((r-2)*(t+2)/6) + z(a(b(c(d(e)))))'
$ echo "$eqn" | rg --passthru -P '\((?:[^()]++|(?0))++\)' -r ''
x * y + z
$ # all lowercase letters and optional hyphen combo from start of string
$ s='apple-fig-mango guava grape'
$ echo "$s" | rg --passthru -P '\G([a-z]+)(-)?' -r '($1)$2'
(apple)-(fig)-(mango) guava grape
Extract and modify
The -r
option can be used when -o
option is active too. The example shown below is not easy to do with sed
.
$ s='0501 035 154 12 26 98234'
# numbers >= 100 and ignore leading zeros
$ echo "$s" | rg -woP '0*+(\d{3,})' -r '"$1"' | paste -sd,
"501","154","98234"
Fixed string matching
Like grep
, the -F
option will allow fixed strings to be matched, a handy option that I feel every search and replace tool should provide.
$ printf '2.3/[4]*6\nfoo\n5.3-[4]*9\n' | rg --passthru -F '[4]*' -r '2'
2.3/26
foo
5.3-29
-F
doesn't extend to replacement section though, so you need $$
instead of $
character to represent it literally.
$ echo 'a.*{2}-b' | rg --passthru -F '.*{2}' -r '+$x\tc'
a+\tc-b
$ echo 'a.*{2}-b' | rg --passthru -F '.*{2}' -r '+$$x\tc'
a+$x\tc-b
Multiline matching
Another handy option is -U
which enables multiline matching.
$ s='hi there\nhave a nice day\nbye'
# (?s) flag will allow . to match newline characters as well
$ printf '%b' "$s" | rg --passthru -U '(?s)the.*ice' -r ''
hi day
bye
Handling dos-style input
rg
provides support for dos-style files with --crlf
option.
# same as: sed -E 's/\w+(\r?)$/xyz\1/'
# note that output will retain CR+LF as line ending
# similar to the sed solution, this will work for unix-style input too
$ printf 'hi there\r\ngood day\r\n' | rg --passthru --crlf '\w+$' -r 'xyz'
hi xyz
good xyz
Speed comparison with GNU sed
Another advantage of rg
is that it is likely to be faster than sed
. See ripgrep benchmark with other grep implementations by the author for a methodological detailed analysis and insights.
# for small files, initial processing time of rg is a large component
$ time echo 'aba' | sed 's/a/b/g' > f1
real 0m0.002s
$ time echo 'aba' | rg --passthru 'a' -r 'b' > f2
real 0m0.007s
# for larger files, rg is likely to be faster
# 6.2M sample ASCII file
$ wget 'https://norvig.com/big.txt'
$ time LC_ALL=C sed 's/\bcat\b/dog/g' big.txt > f1
real 0m0.060s
$ time rg --passthru '\bcat\b' -r 'dog' big.txt > f2
real 0m0.048s
$ diff -s f1 f2
Files f1 and f2 are identical
# nearly 8 times faster!!
$ time LC_ALL=C sed -E 's/\b(\w+)(\s+\1)+\b/\1/g' big.txt > f1
real 0m0.725s
$ time rg --no-unicode --passthru -wP '(\w+)(\s+\1)+' -r '$1' big.txt > f2
real 0m0.093s
$ diff -s f1 f2
Files f1 and f2 are identical
Other alternatives for sed
Posted on September 17, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.