Ruby's gsub method: What, Why, & How

wlytle

wlytle

Posted on November 7, 2020

Ruby's gsub method: What, Why, & How

If you are like me and possess an incomplete understanding of regular expressions (regex), then trying to read the documentation on string manipulation might have left you pulling your hair out. It often feels like these resources are written entirely in regex, and if you left you Rosetta Stone in your other wallet, you may find yourself befuddled. This post will dive mainly into Ruby's gsub method. However, because gsub gets a lot of power from regex, this post will use regex and I will explain it when used. For more detailed regex information, check out these great resources:

gsub... What?

Okay, what is gsub? It's a string manipulation method in Ruby; the g stands for global and the sub stands for substitution. Put simply, gsub looks through a string for a pattern and replaces it with another pattern, and then returns the modified string. It should be noted that Ruby has a sub method that performs the same task on only the first pattern that matches the search pattern. Both gsub and sub are non-destructive; to modify the original, string ruby offers gsub! and sub!. gsub operates as a method with 2 arguments or as an enumerable:

string.gsub(search_pattern, replacement)
# or
string.gsub(search_pattern) do |sp| 
  #do something fancy with each found search_pattern
end 
Enter fullscreen mode Exit fullscreen mode

For both arguments gsub accepts strings (stuff in-between ' ' or " ") or regex (for Ruby purposes regex need always be contained between / /).

gsub... Why?

Why bother with gsub? Well, because the world is messy and, by extension, so is data. Let's say you're collecting phone numbers from a database that didn't properly validate their data format and every so often you get 123.456.7899 or (123)456-7899 instead of 1234567899. Wouldn't it be nice to let ruby find and fix this for you rather than waiting for your program to crash and then manually re-formatting? I for one, say yes!

gsub... How?

Alright, now we're down to the good stuff. How do I make Ruby do all of my string related bidding? First the case of a simple string substitution.

"cheese".gsub('e', '3')
# => "ch33s3"
"cheese".gsub(/[^e]/, '3')
# => "33ee3e"    
Enter fullscreen mode Exit fullscreen mode

The regex [^e] simply means "all the things that aren't e". With this first use case, we already have the ability to fix our phone number format.

phone_number = '(123)456-7899' 
phone_number.gsub(/[()-.]/,'')
# => "1234567899" 
Enter fullscreen mode Exit fullscreen mode

This bit: [()-.] is looking for any of the symbols within the brackets, which gsub then replaces with nothing!

Let's say you have some string data that contains prices, but you are expanding your market to Japan. Whatever will you do? gsub to the rescue!

data = 'Plane tickets are $200'
conversion = 104.72 # yen/dollar according to google at time of writing
data.gsub(/\d+/) { |char| char.to_f * conversion }
# => "Plane tickets are $20944.0"
Enter fullscreen mode Exit fullscreen mode

You're probably thinking, "okay the number is correct but it still says dollars so... not super helpful". That's fair, if only there was some way to swap one character for another, hmm... gsub to the rescue again! We can take advantage of method chaining.

data.gsub(/\d+/) { |char| (char.to_f * conversion) }.gsub('$','¥')
# => "Plane tickets are ¥20944.0"
Enter fullscreen mode Exit fullscreen mode

Ruby is looking for all collections of one or more digits (\d+) and converting each of those to a float, then multiplying them by our conversion faction. After it has done this with all digits, the resulting string gets passed to another gsub and has all $ replaced with ¥.

This is all well and good, but by now you must be wondering, "will gsub help me pretend to be a spy?" Yes... Yes, it can!

As a second argument gsub can also take a hash. If any of the matched patterns from the first argument exist as keys of the hash they will be swapped out for the corresponding value. It's like a super-secret decoder ring!

substitutions = {
  'a' => '@',
  'e' => '3',
  'i' => '!',
}
phrase = 'I am hiding and I have come for your cheese'
phrase.gsub(/[aei]/, substitutions)
# => "I @m h!d!ng @nd I h@v3 com3 for your ch33s3"
Enter fullscreen mode Exit fullscreen mode

The regex here, [aei] is just matching each of those characters individually. Notice that the capital I's were unchanged, both because we were only searching for lower case letters but also because our substitutions hash doesn't have an I key even if it were included in our search.

One more gsub use case to explore before we part ways. By putting search terms in parenthesis we can group them, then reference those groups to perform specific manipulations. The grouped patterns must be found adjacent to each other, otherwise gsub will just return the original string.

phrase = "cheese"
phrase.gsub(/(h)([e]+)/, '{\1}<\2>')
# => "c{h}<ee>se"
Enter fullscreen mode Exit fullscreen mode

In our search pattern, we look first for h then more than one e [e]+ we can then reference these groups by the order '\d' where d is the number of the group from the left. Group numbers can be a bit tricky to figure out so here's one more example:

phrase.gsub(/(h)([e]+)(k)([0-9])([^aeiou])/...
Enter fullscreen mode Exit fullscreen mode

Here h is the first group and would be referenced '\1', [e]+ is the second, '\2' and so on until [^aeiou] which would be group 5 referenced as '\5'. It should also be noted that when referencing groups it's better to use single quotes because double quotes will escape the backslash.

We can also give the groups names to reference like so:

phrase = "cheese"
phrase.gsub(/(?<h>h)(?<es>[e]+)/, '{\k<h>}<\k<es>>')
# => "c{h}<ee>se"
Enter fullscreen mode Exit fullscreen mode

Group names are assigned by (?<group_name>search_term) and are referenced by \k<group_name>. We can do some other fun manipulations this way. For instance, we can rearrange terms, just in case we need to generate some Pig Latin.

phrase = "cheese"
puts phrase.gsub(/([^aeiou]+)(\w+)/, '\2\1ay')
# => "eesechay"
Enter fullscreen mode Exit fullscreen mode

This looks for one or more consonants [^aeiou]+. This becomes group \1. Then it looks for one or more word characters \w+, which becomes group \2. It then switches the order of the groups and adds ay! Note: This will only Pig Latin-ify individual words that don't start with a vowel.

Wrap Up

Hopefully, this has demystified one of Ruby's more powerful string manipulation methods. A lot of fancy stuff can be done with gsub even without regex although learning some regex will kick it up a notch.

Alt Text

💖 💪 🙅 🚩
wlytle
wlytle

Posted on November 7, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related