Finally learning Regex
ibrahim ali
Posted on March 1, 2021
On the first day of the Bootcamp phase of operation phase, one of our first practice problems was to transform a given string into a dash case. While the solution was a simple .split.join chain, I went online and found the .replace method which led me to Regular Expression, shortened to regex. I ended up watching a 45-minute video on regex and at the time, having only the most basic knowledge of javascript, came out way more confused than I went in. Since then, anytime I've attempted to do research on a problem that may require complex string manipulation if the solution requires regex I've always opted out, instead preferring the previously stated .split.join. or using char chars or literally anything else but the dreaded regex. But now being 13 weeks into my Operation Spark journey, I've decided to finally tackle my regex anxieties and add another skill to my programming repertoire.
"A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that specifies a search pattern. Usually, such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation." -Wikipedia.
Regex originates from the 1950s when mathematician Stephen Cole Kleene came up with regular language which is a formal language that a subset of rules can define. This allowed computer science theorists to incorporate it into code, and use it for early text editors and compilers. Many decades later these rules still apply and are incorporated into most programming languages and run behind the scenes of search engines, document editors, and many other such applications.
In this post, I'll give a rundown of basic regex syntax and special characters. Here is a nifty website that can help practice regex expressions in real-time.
The most basic syntax for regex is the forward slashes. All of the code to-be executed goes inside of them. Anything typed in between these forward slashes is what the regex is going to be searching for when it is executed.
/regex/ //regex
Well that is except for the expression flags that go right after the second forward slash.
//the global flag
/the/g
The global flag, represented by g, applies the regex to everything inside of the specified string, else it would only apply to the first character.
The lowercase i denotes case sensitivity.
The lowercase m is for multiline and the lowercase s is for single line
/the/gi //will account for case sensitivity
/the/gs //gm //will account for multiline string]
There are couple of other flags but these are the most basic and frequently used ones.
Then there are special characters. These are really the bread and butter of regex. Each one does a separate unique thing that when chained together can become really powerful tools in string manipulation.
The plus operator, +, is used to check for more than one character in a row. For instance, if I wanted to search for any word that may have two e's in a row such as "street" adding the plus operator in front of the "e" will check for that.
The optional, ?, represented by the question mark, will optionally look for the character that is placed before it. Here the optional is placed right before the "w" so it will look for the "o" and optionally look for the "w" if it's there.
The star operator, , will match any amount of similar characters in a row. Here the "re" is going to search for any amount of "e"s that go after "r".
The period character, ., will match anything that it's placed for, either before or after depending on where you place it. Here the period after "o" will match with any two letters that start off with "o".
The \w is to match word characters. Anything written as word will be searching here. Conversely, the \s will match all whitespace. Also doing the capital version of both letters switches around what the expressions do and will highlight spaces and words, instead respectively.
Inside of the curly braces, you can indicate how many characters you would like your search to range over. Here we have the capital \S which is searching for words and spaces and the 2 and 3 inside of the curly braces is denoting any words between 2 and 3 characters.
The square brackets take any characters you wanna match with whatever next to them. I want to check for anything that has an "o" with either a "g" or "p" after it. This line accomplishes that.
I can also use the dash, -, to check for a range of characters. So here I can check for anything that has the letter "o" followed by any other letter between "f" and "r".
The capture, represented by the parenthesis will capture groups of characters to implement the regex search on. For example, if I wanted to look for an "o" followed by either an "s" or a "t". I will wrap the "s" and the "t" with the capture and then place a column line to denote "or" in between.
So while all of these can be chained together to allow for many dynamic uses. These are the most basic regex special characters that will allow you to do most of what you want to accomplish with regex.
Posted on March 1, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024