When I prefer not to use Regex ⛞
a-tonchev
Posted on April 14, 2021
Regex is surely very useful powerful tool, but it can very easily get complex and confusing.
In a big Project you can not avoid Regular Expressions, because there is not for any case alternative.
But there are some cases, that happens very often, where you might think – okay, I have to use regular expressions.
But there are some alternatives, that you might prefer to use:
Example – parse last part of URL
You have for example following link:
const link = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9';
Now you would like to get the last part (which is an id) - nabb80191e23b7d9
If you make it with regex you would do something like this:
const result = link.match(/\/([^\/]+)\/?$/)[1];
And this will do the job. But the problem here is, you need to focus to understand the code, we can simplify the search with other approaches:
const result = link.substr(link.lastIndexOf('/') + 1);
// OR
const result = link.substr(-16);
// OR
const result = link.split('/').pop(-1);
And thus we will get the same result.
The last approach works on the same principle if we have something with dashes
here-is-my-id-nabb80191e23b7d9
here{SPLIT}is{SPLIT}my{SPLIT}id{SPLIT}nabb80191e23b7d9
And so on.
One thing here to mention is – regex is in the most of the cases slower. Not always, but mostly. Of course performance is not the most important thing in a project, especially in Client Side, it will be not noticable and probably it doesn’t matter, but the bigger benefits of no-regex examples is that we have more code readability and we want to cover more edge cases.
Search in HTML:
Now we want to parse all the link text in an HTML document as a string, e.g.:
const rawHtml = "<html><head><title>titleTest</title></head><body><a href='https://www.test1.com' mydata="13">test01</a><a href='https://www.test2.com'>test02</a><a href='https://www.test3.com'>test03</a></body></html>";
If we want to get all the text with a regex, we will end up with something like:
const regex = /<a[^>]*>([^<]+)<\/a>/ig;
const result = rawHtml.match(regex).map(function(val){
return val.replace(/<\/?a[^>]*>/g,'');
});
But what happens if I add some tags inside of the link tag, e.g. bold text:
....<a href='https://www.test1.com' mydata="13">test01 with some <b>bold text</b> inside</a>....
Then my example is no more working, and I need to adjust it.
Another approach would be to use directly a DOMParser:
const doc = new DOMParser().parseFromString(rawHTML, "text/html");
const matches = [...doc.querySelectorAll('a')];
const result = matches.map(el => el.innerText);
So we will have the same result, and most important – we have clear code, and we can also extend the funcitonality. For example if we want to get the text only of links with data attribute mydata=”13”, we need to adjust the querySelector:
const matches = [...doc.querySelectorAll('a[mydata="13"]')];
We can parse any element, not only link. As soon we have a valid HTML – it will just work.
Validate URL:
As next we want to validate a URL, in regex I just copy this directly from stack overflow:
function validateUrl(string){
return /(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})/.test(string);
}
Regular url like http://google.com or https://something.yahoo.de works fine,
But recently you can use also cyrillic (or other) characters in the domain, so a domain with the name:
http://имена.бг
would be identified as wrong URL.
Also IP domains with http://192.168.0.102 – the regex would identify it as an valid URL, but an invalid IP address, e.g.: http://392.168.0.102 , would be also identified as valid address.
The non-regex solution is to use an URL object
How would this work:
function validateUrl(string) {
try {
const url = new URL(string);
if (url.origin !== 'null') return true;
} catch (e) {
return false;
}
return false;
}
This will validate all the mentioned edge cases, and also it is much cleaner and understandable solution.
Additionally it is easy to extend. For example if we want to parse only specific query parameter or if we want to set a query parameter, we could do something like:
let myUrl = new URL('https://google.com?test=1#someId');
myUrl.searchParams.get('test');
myUrl.searchParams.set('test2', 154);
we can easily also use the hash with myUrl.hash
Validate E-Mail Address
What do you think, how can we validate E-Mail Address without regex?
Well, right now I don’t know any better solution, so I would still use Regex.
But if you think about, we don’t really need to validate any possible E-Mail. If we have a System with E-Mail registrations, we would expect the user to receive a validation link on a existing E-Mail Address
Thats why instead of invest much time and effort in covering every possible edge case of the E-Mail Validation, it would be enough if we just have a simple regex validation, for example in the UI, just in case that the user makes a typo or forget the domain eding or something like this.
One exampe of such effort is this standard https://emailregex.com/
Well it works very nice for the most use cases, but I tryed it on cyrillic characters E-Mail, and it fails to identify a valid E-Mail. So it is also not optimal.
So, regex is cool, regex is nice and powerful, but not necessary the best in matching and replacing.
Posted on April 14, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.