An Overview of Regex Expressions
Noor Sheikh
Posted on April 5, 2020
In this post, I am going to have a quick overview of regex expressions. The review is based on my learning outcomes from one of my recent MS course.
A definition of regular expression from the internet.
A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as .txt to find all text files in a file manager. The regex equivalent is ^..txt$.
Let's start with an example regex expression
^[A-Za-z]+[._-]?[A-Za-z0-9]*[@][A-Za-z0-9]{2,}\.[a-z]{2,6}$
Any guess what the above regex expression represent? If your guess is an email address then you are right. The above regex expression represents a valid pattern for an email address. Although, this might not be a fully valid email address pattern let's use it as an example here.
Below is the valid matching email address for above regex expression:
firstlast@domain.com
first.last@domain.net
first_last@domain.us
first-last12@longdomain.online
Let's break down the above regex expression and compare it with the result.
First of all, every regex expressions begin with ^
caret and end with a $
dollar sign. These two signs indicate the starting and end of a regex expression.
Now, let's extract the first portion of the email before the @
sign, which can also be named as the username.
[A-Za-z]+[._-]?[A-Za-z0-9]*
Let's break it further into three parts.
-
[A-Za-z]+
this pattern represents case insensitive one or more+
letter(s) from a to zA-Za-z
. An example isFirstlast
andfirst
from the above email. -
[._-]?
this part of the pattern represents an optional?
special character of type.
period,_
underscore or-
dash (hyphen) as seen in the example emails above. -
[A-Za-z0-9]*
finally this part of the pattern represents zero or more*
characters of type upper or lower caseA-Za-z
letter(s) or digit(s) from zero to 90-9
after one of the special characters from._-
.
[@]
donates the at sign in the email.
Finally, the last portion of an email is the domain name of the provider and it is donated as below in regex expression.
[A-Za-z0-9]{2,}\.[a-z]{2,6}
Let's break it further into three parts:
-
[A-Za-z0-9]{2,}
this part of the pattern represents 2 or more{2,}
characters of type upper and lower case letter(s) from a to zA-Za-z
and digit(s) from 0 to 90-9
. An example isdomain.com
from the above list of emails. -
\.
this part represents the period used in the domain name part of the email. Note:\
is used for escaping. -
[a-z]{2,6}
this part represents 2 to 6{2,6}
characters from a to za-z
of the last portion of email after the period sign.
Explanation of regex characters:
^
: indicates the start of regex expression.
Example Usage: ^.$
(it returns any character ABCabc123!#@$#%$#%
)
$
: indicates the end of regex expression.
Example Usage: ^.$
(it returns any character ABCabc123!#@$#%$#%
)
\
: indicates escaping in regex expression.
Example usage: [a-z]\.[a-z]
(it escape period between characters abc.def
)
+
: indicates one or more characters in a pattern.
Example usage: [a-z]+
(it returns one or more lowercase characters abcdef
)
*
: indicates zero or more characters in a pattern.
Example usage: [a-z]+[0-9]*
(it indicates optional digit(s) at the end of text abcdef123
and abcdef
both are valid results)
?
: indicates zero or one character in a pattern.
Example Usage: 0?[1-9]
(it makes zero optional at the begging of single-digit 01
, 1
both are valid results)
|
: indicates or/alternative in a pattern.
Example Usage: (cat|dog)
(the valid result of expression is either cat
or dog
)
[]
: indicates matching of values in a pattern.
Example Usage: ca[tr]
(the valid result of the expression is ca
followed by one of the values inside the brackets, car
or cat
)
()
: indicates the grouping of values in a pattern.
Example Usage: (1|2|3)
(the valid result of the expression is on of the values inside the group separated by the pip sign, 1
or 2
or 3
)
A-Z
: indicates upper case letters from a to z.
Example Usage: ^[A-Z][a-z]
(it capitalize first letter for first name John
, Mark
etc)
a-z
: indicates lower case letters from a to z.
Example Usage: ^[A-Z][a-z]
John
, Mark
etc.
0-9
: indicates digits from 0 to 9.
Example Usage: ^[2-9][0-9]{3} [1-9][0-9]{2}-[0-9]{4}
= 340 597-1234
.
or
\s
: indicates white space in a pattern.
Example Usage: [A-Z][a-z]\s[A-Z][a-z]
(it inserts space between first name and last name Steve Jobs
)
Bonus
US Phone Number Try it here
Pattern
^[2-9][0-9]{2}\s[1-9][0-9]{2}-[0-9]{4}$
Explanation
^[2-9][0-9]{2}
: Start the expression, add initial digit between 2 and 9 followed by two additional digits between 0 and 9.
\s
: Represent a white space
[1-9][0-9]{2}
: After white space, add a digit between 1 and 9 followed by two additional digits between 0 and 9.
-[0-9]{4}$
: Add a hyphen followed by four additional digits between 0 and 9 and mark the end of the expression.
Matching Phone Number
234 123-4567
Social Security Number Try it here
Pattern
^[0-9]{3}-[0-9]{2}-[0-9]{4}$
Explanation
^[0-9]{3}-
: Start the expression and add three digits between 0 and 9 followed by a hyphen.
[0-9]{2}-
: Add two digits between 0 and 9 followed by a hyphen.
0-9]{4}$
: Add four digits between 0 and 9 and mark the end of the expression.
Matching SSN
000-00-0000
Street Address Try it here
Pattern
^[1-9][0-9]{3}\s[A-Z][a-z]+\s[A-Z](a-z.)?|[a-z]+$
Explanation
^[1-9][0-9]{3}
: Start the expression, add a digit between 1 and 9 followed by 3 more digits between 0 and 9.
\s
: Add white space.
[A-Z][a-z]+
: Add an upper case letter followed by one or more lower case letters.
\s
: Add white space.
[A-Z](a-z.)?|[a-z]+$
: Add an uppercase letter followed by either one letter and a period, or one or more lower case letters and mark the end of the expression.
Matching Street Address
1234 Sample Street
1234 Sample St.
Posted on April 5, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.