Regular Expressions - not so regular!

utkarshdhiman48

Utkarsh Dhiman

Posted on May 17, 2020

Regular Expressions - not so regular!

Table Of Contents

Introduction

Regular Expressions aka RegEx. or RegExp. are powerful tools widely used for pattern matching within strings. They are supported by almost all programming languages. However choice of language plays a key role in deciding the usage and syntax of the same. Here, I'll be using JavaScript. A Website regex101 will let you use regular expressions even without the knowledge of programming language.

What does it look like?


This code will make sense as you proceed further.

How to use regular expressions

It depends majorly upon the language. In JavaScript there are various functions for the same eg: match(), test() and search() are widely used. In MySQL keyword LIKE, REGEXP or operator ~ (aka post-fix operator) are used for searching.

A meme

Kneel before the god ;P by memegenerator

Regular Expressions usually have

Quantifiers, Operators, Modifiers (or Flags).

Modifiers

Modifier Description
'g' for global search i.e. It won't stop searching after first match and will return all the matches.
'i' for case in-sensitive search
'm' for multi-line search (the regexp should be surrounded by ^ and $)

Brackets

Brackets Description
[abc] look for any one character between the square brackets(like an OR operator in programming languages)
[a-z] includes all lower-case alphabets
[A-Z] includes all upper-case alphabets
[A-z] includes both upper-case and lower-case alphabets
[0-9] includes all numerals
[^abc] Will look for everything other than a,b,c (^ represents inverse here only in square-brackets otherwise it represent anchor operator)
(p|q) It is known as grouping operator, it divides the expression within sub-groups. It also saves the index of subgroup starting from 1-9 (which is used by back-reference operator)

Meta characters

Meta-character Description
. Finds a single character, except newline or line terminator
\w Finds a word character it include [A-Za-z0-9_]
\W Finds a non-word character
\d Finds a digit
\D Finds a non-digit character
\s Finds a white-space character
\S Finds a character that is not white-space
\b Finds a match only at the beginning or end of a word, beginning like this: \bHI, end like this: HI\b
\B Finds a match, where \b doesn't
\n Finds a line feed character
\f Finds a form feed character
\r Finds a carriage return character
\t Finds a tab character

Quantifiers

Quantifiers Description (n here is anything you want to match)
n{2} match only 2 instances of n
n{1,5} match at least 1 at most 5 instances of n
n{1,} match at least 1 instance of n
n+ Shorthand to match 1 or more instances of n
n* Shorthand to match 0 or more instances of n
n? Shorthand to match 0 or 1 instances of n

Miscellaneous

escape sequence

Meme on escape-sequence by xkcd

'|' operator can be used to separate sub groups.
'\1' operator(aka back-reference operator) is used to refer to previous subgroup of that index 1 in this case.
There is no concatenation operator as such. Things written together are already concatenated.
Some symbols that are used in expressions can't be used directly. '\' (backslash) is used to escape-sequence.
Multiple bracket sets can be used like [a-z] and [0-9] can be combined in following way [a-z0-9] without '|' and space.
^ and $ are used at beginning and end respectively however when ^ is used as [^\w]+ it is used to invert the selection. In this case no word will be selected i.e. only symbols and white spaces will be selected.

Let's see some interesting examples

To validate an email, username and password

meme on regular expressions

Values are screened by RegExp. for security purpose. By Quickmeme

Real-world application of RegEx.

  • Forms: Usernames, Email addresses, passwords etc. are validated through RegEx. expressions. as seen above.

  • Search: RegEx. are used to search for files and within files. Sometimes replacement is followed by search.

  • Replace: With new updates in languages newer syntax come into play and sometimes older syntax are depreciated. Therefore changing each line of code would be a miserable. To ease the process there are programs that can convert syntax without damaging the functioning of original code.

  • Data Extraction: RegEx. are used in SQL (a language that makes queries to database) to look for records matching some criteria.

saved the day

Saved the day. Source Xkcd

Summary

With Regular expression in your tool box, you can manipulate strings like pro!
Otherwise your life would be much more difficult than it need to be. Learn more at W3school
or here

đź’– đź’Ş đź™… đźš©
utkarshdhiman48
Utkarsh Dhiman

Posted on May 17, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related