Recognize Regex Easily
andy_801 🔫🇺🇦
Posted on February 16, 2020
Looking at docs for Regular Expressions looks like there are a lot of notations you need to learn and memorize to know it. It is a bit of overwhelming information if you don't use regex frequently or just started to use it. Here I will try to showcase basic regex parts which were important for me to know and understand to become familiar with regex. And for details, you can always check MDN Regex Docs or other sources.
Regex
Regular expressions are patterns to parse strings, the rules applied to it is universal for all languages. Regex might look like this:
Generally, It can be seen consisted of these 4 parts:
Where:
- Slashes
/
used to enclose regex pattern in JS, similar to quotes for'string'
for example. -
^
,$
,flags
are optional anchors and flags. - Pattern is a character combination to be used in a search
So, for example, regex /x/
will search for the first occurrence of character x
in a string:
If you want to find all matches for the pattern you can use /g
flag at the end, which stands for global search:
There are 6 different flags that could be added at the end of regex for special settings. Most used are /g
for a search for all matches and /i
for case insensitive searching.
And different characters can be combined for sequence search:
Pattern
Beside of this, any pattern can be seen as a set of sequences of rules
For example, to describe pattern for time 12:00
I can write pattern like this one:
/ (should be 2 digits) (then colon) (then should be 2 digits) /
or in terms of regex:
Ofc, this is a very general time pattern, since it will also match strings like 25:00
and score is 160:740
. Try it here.
Sequence
Each sequence also can be seen as a pair of Token & Quantity.
Token used to describe what character, set of characters or special symbols need to search for. And Quantity used to say how many times it occurs (number of repeats). The image above says: Any character from 0 to 9 and it occurs twice in a row
. And when quantity isn't specified could be assumed that token will occur only once, like for semicolon :
in the example above. Basically, it will be the same as:
You will meet plenty of quantity symbols, like +
, ?
, *
, {n}
, {n,m}
all used to describe how many repetitions of preceding token should be. Like in the example below character u
could be present 0 or more times:
Ranges and Groups
Frequently, you will see tokens as a set of nested sequences. These sequences could be defined as ranges and groups. The range set should be defined with brackets [set of characters]
and it defines possible options rather than strong sequences. Like /cat/
one might read as could have c, a, t characters
rather than just the word cat
. So regex below will have 6 matches in the next string:
Back to parsing time example lets create regex for hours. Regex can parse numbers as single digits between 0 and 9. So to match 16
hours i.e. we need set rules for possible values of first and second digit in hours. We will also split it into 2 possible sets:
/ (should be number between 00-19) or (number between 20-23) /
Next regex will define match which starts with one token which rather 0
or 1
and followed by another one token which is one digit from 0
to 9
.
So this way we will describe digits from 00
to 19
. Now we can describe hours within the 20-23 hrs period more precisely. The rules will be like this:
In regex |
pipe symbol used for OR operator and the parenthesis (group)
used to group things. And combining it with rules for semicolon and minutes will have this regex for time:
You can test it here ->
Anchors
Special anchors /^
on start and $/
end of a pattern used to match the beginning and end of the string. And you can use both when you want to restrict input to only what regex rules allow. In case of time regex:
Additional Resources:
Here is few resources to play, don't forget to check community regexes from sidebar menu for some inspiration
Posted on February 16, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.