Understanding Regular Expressions once and for all [PART 3]
Svenja Schäfer
Posted on May 8, 2020
Originally published at: Codegram's blog
Welcome back to the third edition of Understanding Regular Expressions once and for all. By now, you shouldn't freak out any longer when seeing something like this: /^\d$/
. In case you are really scared, you might want to read the first parts of this series (again): Understanding Regular Expressions once and for all [PART 1] and Understanding Regular Expressions once and for all [PART 2]. If you are just a little bit unsure, let me recap what we've learned:
HEATING UP
These are the characters we know about (and don't forget the literal characters):
^
- the hat/caret character matches characters at the start of a line/string
$
- the dollar sign matches characters at the end of a line/string
\w
- matches word (alphanumeric) characters (including underscores)
\d
- matches digits
\s
- matches whitespaces and line breaks
\b
- the word boundary pattern
.
- matches any character except line breaks
And here's the most important sentence from part 1:
Do not read the word itself but each character separated.
With that in mind, let's have a look at the expression from the beginning: /^\d$/
. We have the opening slash which indicates that a regular expression starts. After that, we see the caret character. So the character we are looking for is located at the very beginning of a line. The following backslash should make you aware that the next character is somehow important. As it's a d
, we know, it's an expression to match digits. The last regular expression is the dollar sign which means, the digit must sit at the end of a line. Confused? Great. /^\d$/
doesn't mean anything else than matching these options:
1
2
3
4
5
6
7
8
9
There is no character before the digit, nor after. Just the numbers 0 to 9.
Truth be told, most of the time, we want more. A single character is a bit lame.
GOOD THINGS COME IN THREES
Spoon, knife, fork. Yellow, red, blue. Regular expression quantifiers. Well, honestly, there are a lot more than three, but the three quantifiers I'll show you here today are a fabulous start. Just believe me.
As we said, /^\d$/
will match a single character only. But, and see how easily I get the transition to my second favourite topic: food (right after regular expressions), what if we are looking for all digits at the beginning of an ingredients list? It makes a huge difference, if you have to use 1 tablespoon of soy sauce or 10, right? Right. So how do we do that? We know it must be at the beginning of a line, so the caret should definitely be part of it. And the expression to match digits. So far, our regular expression would look like this: /^\d/
.
And here's the recipe we'll work with today:
Vegan Mayonnaise
120 g Cashew nuts
60 ml Olive oil
125 ml Oat milk
2 tsp Apple vinegar
1 tsp Maple syrup (or another sweetener of your choice)
Salt
We would match 1, 6, 1, 2 and 1. That's bad. I don't wanna know what mayonnaise tastes like with 1 cashew nut, 6 ml olive oil and 1 ml oat milk. So, we want to match all digits. Say hello to the +
character. The plus sign indicates that the preceding character will be there minimum once but maximum endlessly. Important is the minimum once part. So if we end up with a regular expression like this: /^\d+/
we would match all the digits at the beginning of each line: 120. 60, 125, 2 and 1. The +
can be used for every other regular expression as well. For example /Mayon+aise/
would match Mayonnaise, Mayonaise and even Mayonnnnnnnaise. But not Mayoaise. And let's be honest, who doesn't have trouble with this word and prefer sticking to the short version Mayo instead?!
Okay, with the +
sign, we finally caught all our ingredients. Kinda. We're missing the salt. Personally, I like salt, so I don't want to miss this part of the recipe either. But, instead of +
where the digit must exist minimum once, I can use *
. When using the asterisk, or (little) star, the preceding character gets matched minimum zero but maximum endlessly times. So with /^\d*/
, we would get the ingredients that use a number and the ones without. And with the *
character, we covered two of three exciting quantifiers.
NOT SURE?
The third quantifier is fairly similar to the asterisk (*
) sign. You use it when you're not sure if the character before it is there or not. The difference is, that if the character exists, it will exist once, not two, three or endless times. And it looks like this: ?
. A question mark. The word Mayonnaise is a nice example for this one again. Two or one "n"s? I said, that we match the anterior character, so what would the regular expression look like?
If you have something like this /Mayon?aise/
I must ask you to give it another try. At least if we want to match the correct spelling. Remember, the ?
matches characters that exist minimum zero times but maximum one time. The correct regular expression would be this: /Mayonn?aise/
: maybe the second "n" exists, maybe not.
I hope there aren't any more question marks over your head for the moment. But if so, don't hesitate to reach out to us! And get excited for the next part of Understanding Regular Expressions once and for all.
Photo by Mitchell Luo on Unsplash
Posted on May 8, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024