Regular Expressions As Applied In Programming

kwereutosu

Kwerenachi Utosu

Posted on March 30, 2021

Regular Expressions As Applied In Programming

Regular Expressions as also know as Regex, work on the principle of providing characters that must be matched. Regular Expression is usually used especially when trying to work on a large number of data.

Working on that much data, we're not expected to make updates or alter the data one after the other because it would take up too much of our time, so we use Regex to group them together and make our update.

What is a Regular Expression?

A regular expression is a sequence of characters that specifies a search pattern. Regular expressions are also known as patterns used to match character combinations in strings. Usually, such patterns are used by string-searching algorithms for find or find and replace operations on strings, or for input validation.

Alt Text

For Example, a simple Expression like [a-zA-Z] would match all the alphabets. Every word containing simple plain alphabets without any tone marks would fall into that category, words like code, table, Markdown, and many more.

Some Major Uses Of Regex In Programming
Regular Expressions as stated earlier are very useful to programmers for a variety of tasks.

  • For implementing find and replace function that identifies group or sequence of characters and replaces them with something else as specified by the expression.

  • For search operations, for example, being able to identify a single string in a whole document.

  • For Input Validation, for example, passwords, email addresses, and other input types that may require a special character.

Metacharacters In Regular Expressions

Before we dive into writing some Regex and explaining the various patterns, we must first understand the metacharacters involved in creating this expression.

A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.

       [  ]  -  +  *  ?  ^  $  (  )  {  }  |  .  \
Enter fullscreen mode Exit fullscreen mode

Each of these metacharacters above has a completely different meaning when interpreted in a Regex enabled environment.

Square Bracket [ ]
This is used to group like data together and create a range. For example [0-9] simple means every digit from 0 to 9.

Hyphen -
This simple represents to,i.e a range or an interval.

Plus Sign +
This matches One or more of the preceding character or group of characters.

For example, a simple expression like /ab+c/ simply means that words gotten from this expression must start with an a, have one or as many bs as possible, and must always end with a c. Some accepted outputs are abc, abbc, abbbc, and many more.

Asterisk *
This matches Zero or more of the preceding character or group of characters.

For example, a simple expression like /ab*c/ simply means that words gotten from this expression must start with an a, have zero or as many bs as possible, and must always end with a c. Some accepted outputs are ac, abbc, abbbc, and many more.

Question Mark ?
This matches Zero or one of the preceding character or group of characters.

For example, a simple expression like /ab?c/ simply means that words gotten from this expression must start with an a, have zero or one b, and must always end with a c. There are only 2 accepted outputs ac and abc.

Caret ^
The caret serves two different purposes. It is a special character that denotes “the beginning of a line” and it is a “not” operator inside of []s.
Take a look at these three examples,

  1. /([^aeiou])/
  2. /^([aeiou])/
  3. /^([^aeiou])/

The first example simply matches any character that is not a vowel.
The second one matches a vowel at the start of a line.
And the third one matches any character that is not a vowel, at the start of a line.

Dollar Sign $
This matches the end of a line. For example, /$([aeiou])/ matches a vowel at the end of the line.

Parentheses ()
Parentheses in a regular expression usually indicate a ‘capture group’, or a subset of the string to be stored for later reference.

For an example like /^([aeiou])([0-9])/
This example matches a string starting with a single vowel (the first capture group is a single vowel), followed by any single digit (the second capture group is any digit).

Curly Braces {}
This matches a specific range. It is used when you want to be very specific about the number of occurrences an operator or subexpression must match in the source string. Curly braces and their contents are known as interval expressions. For example {n} matches preceding element or subexpression must occur exactly n times.

Pipe Symbol |
Two regular expressions separated by a pipe symbol match either an occurrence of the first or an occurrence of the second.

For example, a simple expression like /(a|b)*c/ simply means that words gotten from this expression must start with either an a, a b or none, and must always end with a c. Some accepted outputs are c, ac, bc, and many more.

Dot .
The dot operator matches any character. It has so many dynamic uses.

For example, the expression /^([0-9])(.)/ simply matches a string that starts with a digit, followed by any character.

*Backlash \*
A Backlash is used to escape metacharacters. The term "to escape a metacharacter" means to make the metacharacter ineffective (to strip it of its special meaning), causing it to have its literal meaning.

For example, a dot (".") stands for any single character as a metacharacter. So when we have an expression like ab., it matches an a, a b followed by any character. But when it's escaped like ab\. it simply means ab.

Regular Expressions In Methods And Classes

Now let's apply all we've learned in the examples below

Example 1
The expression in this first example matches two different words in the string given as str in the code snippet below. The program would return an affirmative alert.

var regex = /li[kv]e/;
var str = "They like the live show.";

// Test the string against the regular expression
if(regex.test(str)) {
    alert("Match found!");
} else {
    alert("Match not found.");
}
Enter fullscreen mode Exit fullscreen mode

Output

Match found!
Enter fullscreen mode Exit fullscreen mode

Example 2
In this next example, a g was added to record every word occurrence of the sequence in the string given below.

var regex = /[rt]?each/g;
var str = "To teach means to reach as many kids, each also reaching others";
var matches = str.match(regex);

if(regex.exec(str)) {
    alert(matches.length + " Matches found!");
} else {
    alert("Match not found.");
}
Enter fullscreen mode Exit fullscreen mode

The output for this example is teach, reach, each, and reach following the explanation given previously, which means they are 4 in number. You can try running this code snippet for confirmation.
Output

4 Matches found!
Enter fullscreen mode Exit fullscreen mode

Character Classes

A character class is the most basic regex concept after a literal match. It makes one small sequence of characters match a larger set of characters. For example, [A-Z] basically be given as \u, and \d could mean any digit.

See some character classes below:

  • \d represents digits. It is also written as [0-9].
  • \D represents Non-digits. It is also written as [^0-9].
  • \w represents alphanumeric characters and the underscore. It is also written as [A-Za-z0-9_].
  • \W represents non-word characters. It is also written as [^A-Za-z0-9_].
  • \s represents whitespace characters. It is also written as [ \t\r\n\v\f].
  • \S represents non-whitespace characters. It is also written as [^ \t\r\n\v\f].
  • \a represents alphabetic characters. It is also written as [A-Za-z].
  • \l represents lowercase characters. It is also written as [a-z].
  • \u represents uppercase characters. It is also written as [A-Z].
  • \x represents Hexadecimal digits. It is also written as [A-Fa-f0-9].
  • \b simply represents boundaries.
  • \g simply represents global search. Meaning it'll match all occurrences.

Note that, these character classes might differ in some programming languages or test environments.

Character Classes In Programming

We'd look at a few more examples that involve Character classes and other literals.

Example 3
The expression below splits the strings into an array of strings. According to the Regex, the split criteria is any white space. I console logged so that you can see the list of strings.

var regex = /[\s,]+/;
var str = "My house is painted white, purple and blue";
var parts = str.split(regex);

// Loop through parts array and display substrings
for(var part of parts){
    console.log(part)
    document.write("<p>" + part + "</p>");
}
Enter fullscreen mode Exit fullscreen mode

Example 4
In this next example, we replace all the digits in the str variable with hyphens.

var regex = /\d/g;
var replacement = "-";
var str = "There 3 kids, 4 adults and 98 animals";

// Replace digits with -
var matches = str.replace(regex, replacement);
console.log(matches)
Enter fullscreen mode Exit fullscreen mode

Example 5
For the fifth example, we also did a replace but replaced all lowercase vowels with an asterisk.

var re = /[aeiou]/g;
var replace = "*";
var str2 = "Every one of them knows the answer to No 2";

// Replace lowercase vowels with *
var matches2 = str2.replace(re, replace);
console.log(matches2)
Enter fullscreen mode Exit fullscreen mode

You can also retest the given code snippets above and try out new expressions in your free time. As usual, the more you practise, the higher your chances of becoming an expert.

Conclusion

Regular Expressions can be applied to every aspect of computer programming and data management. It can be used to perform SQL or database operations. With Regex there's always a lot of scenarios to try out. I hope this article has been very helpful to you. It doesn't stop here though, practice, practice and more practice.

💖 💪 🙅 🚩
kwereutosu
Kwerenachi Utosu

Posted on March 30, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related