World-first Static time RegEx engine with O(0) time complexity
Jakub Švehla
Posted on February 15, 2021
What the heck is that?
-
RegEx
engine written with static types?! - Code which evaluates
RegEx
"templates" in compile time so you know the result before you run your app?! -
RegEx
engine which works withO(0)
run-time complexity?! - Minified 0-bite (GZip) length output?!
- Fully bugged and not ready for production?!
I'm not kidding!!! This is not just a dream!
This is the first world RegEx
engine written in pure Typescript types.
Check the working examples!
Github Repo - ts-generics-RegEx-engine
you can play with the source code here
Disclaimer
- The code is not ready to use in production environment.
- Because of the stack limits of Typescript, some
regEx
s stop working because they are too long and trigger recursion stack overflow known asType instantiation is excessively deep and possibly infinite
. -
RegEx
backtracking is not implemented yet. - The parser supports only a small subset of PCRE standard. Specifically
.?*+()\\
symbols.
Motivation + usage
Thanks to new features of Typescript 4.1.x we are able to parse a string into a Tuple of tokens and much more! So I decided to write my own custom RegEx
engine just by using Typescript static types to demonstrate how powerful the type system of Typescripts is.
How does the RegEx engine work under the hood?
As you may know, programming languages compilers + interpreters. You may know that they are pretty complex and includes Lexers, Parsers, Interpreters, and so on.
On the other side, this small engine is quite simple, so there are just 3 small modules:
- 1. Tokenizer
- 2. Parser
- 3. Interpreter
1. Tokenizer
A small generic type TokenizeString<T>
just parses RegEx
template to tokens which are used as the input for 2. Parser
to build RegEx
Abstract-Syntax-Tree (AST).
Examples:
type T0 = TokenizeString<'\\(+(ab)+'>
type T1 = TokenizeString<'\\(+(a(xy)+(xx)b)+'>
2. Parser
type ParseRegExTokens<T> = ...
takes the tokenized template and does the syntax analysis which produces an Abstract-Syntax-Tree (AST) Model of the RegEx
template.
Examples:
type T3 = ParsedRegEx<TokenizeString<'\\(+(a(xy)+(xx)b)+'>>
As you can see, the parser supports nesting of structures (like brackets in brackets in brackets etc...)
AST for '\\(+(a(xy)+(xx)b)+'
template will look like this:
[{
type: 'element';
value: "(";
quantifier: 'exactlyOne';
}, {
type: 'element';
value: "(";
quantifier: "zeroOrMore";
}, {
type: 'groupElement';
states: [{
type: 'element';
value: "a";
quantifier: 'exactlyOne';
}, {
type: 'groupElement';
states: [{
type: 'element';
value: "x";
quantifier: 'exactlyOne';
}, {
type: 'element';
value: "y";
quantifier: 'exactlyOne';
}];
quantifier: 'exactlyOne';
}, {
...; // and so on
}, {
...; // and so on
}, {
...; // and so on
}];
quantifier: 'exactlyOne';
}]
3. RegEx Interpreter
The last step is to create a proper "interpreter" type Test<RegExp, TestString> = ...
which takes a template and a test string by applying rules from the RegEx
AST.
Examples:
And that's it! 🎉 🎉
If you don't believe, you can check the full source code in this GitHub repo: https://raw.githubusercontent.com/Svehla/ts-generics-RegEx-engine
Wait... And what about the real Javascript
output? Let's check it out!
Haha! A few hundreds line of static types and runtime output is empty with O(0)
time complexity! That's the magic of Typescript 🦄
And what's next?
If you're interested in another advanced usage of the Typescript type system, you can check these step-by-step articles/tutorials on how to create some advanced Typescript generics.
Posted on February 15, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.