One Byte Explainer: Regular Expressions
Matt Ellen-Tsivintzeli
Posted on June 18, 2024
This is a submission for DEV Computer Science Challenge v24.06.12: One Byte Explainer.
Explainer
A regular expression (regex) finds patterns in strings with one character of memory. It has an alphabet & defines a language. The alphabet can be any set of characters, including the empty string. Regexes can be joined, joining the alphabets and languages.
Additional Context
Because original regular expressions only allowed for one character of memory, there were no look aheads or look behinds.
A language that is defined by a regular expression is called a regular language.
Regular expressions have notations to allow succinct ways of defining them. These notations vary depending on the implementation, but usually have the follow forms:
-
*
- the character or group preceding this must appear at least 0 times. e.g.abc*
would matchab
,abc
,abcc
, etc. -
+
- the character or group preceding this must appear at least once. e.g.abc+
would matchabc
,abcc
, etc. -
?
- the character or group preceding this must appear at most once. e.g.abc?
would matchab
orabc
. -
.
- this matches any character. e.g..
would matcha
,b
,c
, etc. -
[]
- only match the characters inside the square brackets. e.g.[hjk]
would matchh
,j
, ork
. -
[^]
- only match the characters not inside the square brackets. e.g.[^abc]
would not matcha
,b
, orc
, but would match anything else. -
()
- the string inside the parentheses is a group e.g.(abc)
would matchabc
and the regular expression engine would assign that result a group. -
(|)
- the group can be either what's on the left or what's on the right of the|
. e.g.(abc|def)
would matchabc
ordef
.
Posted on June 18, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.