Jesse Phillips
Posted on December 30, 2019
Meta
In a previous post about string parsing, I noted that regular expressions were available, but did no dive into an example.
Intro
Regular Expressions (regex) is a way of defining a search pattern. It is a very common operation in programming and comes with an age old joke.
I had a problem which I solved with regular expressions
Now you have two programs
For learning how and what regular expressions look like I suggest this help site regularexpressions.info. But you can keep reading to learn about them and how to take advantage of them using D.
Format Phone Number
I wanted to start with something simple, but too simple seemed not useful. In this example I decided to show changing from one US phone format to another.
import std.regex;
void main()
{
auto data = "555.876.9846";
auto re = regex(r"(\d{3})\.(\d{3})\.(\d{4})");
auto ans = replaceAll(data, re, "($1) $2-$3");
assert(ans == "(555) 876-9846");
}
There are a number of things in this along with the regular expression.
- Regex
- groupings
- digits
- repeat
- escape
- regex string literal
- replacement of all matches
- match group referencing
String Literal
One thing that can be problematic is that the programming language provides special codes string literals can use to represent unprintables. D provides the regex string literal r""
otherwise the above would look like
"(\\d{3})\\.(\\d{3})\\.(\\d{4})"
r"(\d{3})\.(\d{3})\.(\d{4})"
This string is still problematic if you need to use quotes. D also has a wysiwyg literal with back ticks, which is problematic for if that is needed. And D provides... Alright that is enough you can follow the link above.
Regex
Digits
\d
The backslash enters into a special "code" in this case d
which stands for digit.
Other common codes
- w - word character (alphanumeric)
- s - whitespace
Repeat
\d{3}
Here I'm using an explicit number of instances. It will only match if three digits are written in a row. Other common repeat codes:
- \d* - zero or more
- \d+ - one or more
Grouping
Regular Expressions allows matched content to be stored in groupings. This is done with the parentheses.
r"(\d{3})"
This creates a match group 1. This is used in the replacement string as $1
.
Escape Sequence
\.
In regular expressions the dot .
has a special meaning, match any character. I wanted to match on the literal character. Thus I escape the special processing using backslash, just like when I escape into into special meaning with \d
.
Posted on December 30, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.