Regular Expressions - not so regular!
Utkarsh Dhiman
Posted on May 17, 2020
Table Of Contents
- Modifiers (or flags)
- Brackets
- Meta Characters
- Quantifiers
- Miscellaneous
- Real World applications
- Summary
Introduction
Regular Expressions aka RegEx. or RegExp. are powerful tools widely used for pattern matching within strings. They are supported by almost all programming languages. However choice of language plays a key role in deciding the usage and syntax of the same. Here, I'll be using JavaScript. A Website regex101 will let you use regular expressions even without the knowledge of programming language.
What does it look like?
This code will make sense as you proceed further.
How to use regular expressions
It depends majorly upon the language. In JavaScript there are various functions for the same eg: match()
, test()
and search()
are widely used. In MySQL keyword LIKE
, REGEXP
or operator ~
(aka post-fix operator) are used for searching.
Regular Expressions usually have
Quantifiers, Operators, Modifiers (or Flags).
Modifiers
Modifier | Description |
---|---|
'g' | for global search i.e. It won't stop searching after first match and will return all the matches. |
'i' | for case in-sensitive search |
'm' | for multi-line search (the regexp should be surrounded by ^ and $) |
Brackets
Brackets | Description |
---|---|
[abc] | look for any one character between the square brackets(like an OR operator in programming languages) |
[a-z] | includes all lower-case alphabets |
[A-Z] | includes all upper-case alphabets |
[A-z] | includes both upper-case and lower-case alphabets |
[0-9] | includes all numerals |
[^abc] | Will look for everything other than a,b,c (^ represents inverse here only in square-brackets otherwise it represent anchor operator) |
(p|q) | It is known as grouping operator, it divides the expression within sub-groups. It also saves the index of subgroup starting from 1-9 (which is used by back-reference operator) |
Meta characters
Meta-character | Description |
---|---|
. | Finds a single character, except newline or line terminator |
\w | Finds a word character it include [A-Za-z0-9_] |
\W | Finds a non-word character |
\d | Finds a digit |
\D | Finds a non-digit character |
\s | Finds a white-space character |
\S | Finds a character that is not white-space |
\b | Finds a match only at the beginning or end of a word, beginning like this: \bHI, end like this: HI\b |
\B | Finds a match, where \b doesn't |
\n | Finds a line feed character |
\f | Finds a form feed character |
\r | Finds a carriage return character |
\t | Finds a tab character |
Quantifiers
Quantifiers | Description (n here is anything you want to match) |
---|---|
n{2} | match only 2 instances of n |
n{1,5} | match at least 1 at most 5 instances of n |
n{1,} | match at least 1 instance of n |
n+ | Shorthand to match 1 or more instances of n |
n* | Shorthand to match 0 or more instances of n |
n? | Shorthand to match 0 or 1 instances of n |
Miscellaneous
'|' operator can be used to separate sub groups.
'\1' operator(aka back-reference operator) is used to refer to previous subgroup of that index 1 in this case.
There is no concatenation operator as such. Things written together are already concatenated.
Some symbols that are used in expressions can't be used directly. '\' (backslash) is used to escape-sequence.
Multiple bracket sets can be used like [a-z] and [0-9] can be combined in following way [a-z0-9] without '|' and space.
^ and $ are used at beginning and end respectively however when ^ is used as [^\w]+ it is used to invert the selection. In this case no word will be selected i.e. only symbols and white spaces will be selected.
Let's see some interesting examples
To validate an email, username and password
Real-world application of RegEx.
Forms: Usernames, Email addresses, passwords etc. are validated through RegEx. expressions. as seen above.
Search: RegEx. are used to search for files and within files. Sometimes replacement is followed by search.
Replace: With new updates in languages newer syntax come into play and sometimes older syntax are depreciated. Therefore changing each line of code would be a miserable. To ease the process there are programs that can convert syntax without damaging the functioning of original code.
Data Extraction: RegEx. are used in SQL (a language that makes queries to database) to look for records matching some criteria.
Summary
With Regular expression in your tool box, you can manipulate strings like pro!
Otherwise your life would be much more difficult than it need to be. Learn more at W3school
or here
Posted on May 17, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.