Stahhp Screening for TLDs in Your Email Fields
Adam Nathaniel Davis
Posted on March 11, 2023
I've had it. I've been quiet on this subject for far too long. And now I feel compelled to finally crank out this angry diatribe. If you're a web developer (and most of the people reading this article are web developers), then for the love of all that is holy, please STAHHP screening email fields against an "approved" set of Top Level Domains (TLDs).
You may think that you're really clever with your super-advanced email validation. You may think that you're forcing users to enter a "valid" email address. But I've seen this done incorrectly sooooo many times that, by this point, I'd bet there's a good chance that you're screwing it up - and pissing off some subset of your users.
History
Almost as soon as you start learning web development, you also learn that you should be validating user inputs. Ideally, you're validating those inputs on the backend and the frontend. (Because it's clunky as hell to submit a form to the server, only to have it spit everything back at you because something didn't look "right".)
And even though frontend validation is only half of the equation, there is a ton of value that can be provided to the user by giving them immediate feedback, in the browser, about fields that don't pass muster. It's elementary to warn a user that a required field is empty or that a given input is too short/long. But for as long as I've been doing this (a quarter-century), it's always been something of a challenge to properly validate email fields.
Originally, email validation was fairly straightforward. Sometimes it was done with regular expressions. Sometimes it was done with more "manual" checks. But the basic validation went something like this:
Ensure that there are no invalid characters in the email address. (For example: a copyright mark -
©
. There is no valid email address that contains©
.)Ensure that even the allowed "special" characters do not repeat. (For example:
.
is acceptable - and commonly used - in email addresses. But there is no valid email address that contains..
.)Ensure that there's one - and only one -
@
character in the email address - and that there are non-empty strings on both sides of the@
character.Ensure that the portion to the right of the
@
character contains at least one.
character. (The portion to the right of the last.
character - after the@
character - is assumed to be the TLD.)Ensure that the email address's TLD is "valid".
But it's that last point that causes all sorts of problems...
I remember the early regular expressions that I'd see for email validation. They usually took everything after the last .
and checked it against a list of "known" TLDs. And, for a little while at least, this was... workable. Because there was a finite - and fairly static - list of valid TLDs.
In the "early days", nearly all valid emails ended in .com
or .net
or .edu
or .gov
or .org
or any of the country-specific TLDs (e.g., .uk
). So most email validation scripts tried to check the last portion of the email address against these "known-good" TLDs.
The TLD boom
But nowadays, there's a huge proliferation of valid, working TLDs that have nothing to do with the old stalwarts like .com
or .net
. Your website can have a perfectly valid/functional TLD like .pizza
or .health
or .voyage
. And of course, if your web presence can use those TLDs, then it's entirely possible that your email address may also use those TLDs.
Granted, the vast majority of all websites (and hence, all email addresses) still end in a "common" TLD like .com
or .net
or .org
. But every single day there are new websites - and new email addresses - coming online that do NOT use those common TLDs.
There are still sooooo many sites out there that try to do a strict validation of your email address - and they attempt to do this by checking the TLD against a list of "known-good" TLDs. The problem arises because almost none of these sites are fastidious about ensuring that their list of "known-good" TLDs are truly up-to-date with the actual list of real, live TLDs that are available.
My own private hell
My CV site is at https://adamdavis.codes. My email address is also hosted under adamdavis․codes. Obviously, it doesn't have a "common" TLD. I've done that for two specific reasons:
When I first setup my site, adamdavis.com simply wasn't available.
Even if adamdavis.com was available, I'm extremely happy with adamdavis.codes. I'm a coder. The
.codes
TLD is a perfect choice for my CV. And as such, it only makes sense that I'd have an email address under the same TLD.
This isn't the only time I've delved into "uncommon" TLDs. My latest project is https://paintmap.studio. I also previously had an email address with a .voyage
TLD.
When I first started using these "uncommon" TLDs, I'd find that my email address would frequently get rejected from all sorts of online forms. The form would give me a validation error, stating that my email address isn't "valid". But... it absolutely IS valid!
To be fair, I have found that email addresses, like my personal adamdavis․codes address, are indeed "passing" many more form validations nowadays. But it's still far-too-often that I'm trying to submit an online form - and yet I'm stopped when the website tells me that my perfectly-valid email address is... "invalid".
You know what happens when a website rejects my perfectly-valid email address? Well, if the activity I'm trying to complete is in any way optional, I simply QUIT the process. I've abandoned shopping carts that had hundreds of dollars of items merely because the jank-ass website claimed that my email address was invalid. I've abandoned job applications for the same reason.
Yes, I do have a Gmail account. And in those scenarios where I feel compelled to complete the process, I switch out my preferred .codes
email address with my Gmail address. But I don't do this unless I feel that I simply must complete the process. And whether I abandon the process or switch to my Gmail address, the whole failed-validation process simply infuriates me.
When the "new" TLDs first started rolling out, I found this process to be annoying - but understandable. It was easy to see how the web teams supporting these features simply weren't keeping up-to-date with the latest TLD specs. But today? In 2023?? I'm sorry, but it's downright unacceptable.
What do you think you're accomplishing?
Frontend (i.e., JavaScript) form validation is, for the most part, a good thing. The last thing you want to do is give the user a form that allows nearly any completely-illogical value to be submitted. But there's a point where strict validation undermines the user experience. And in some cases, it can downright alienate your users.
Take email validation for example. When I'm implementing email address validation in my forms, I tend to use this NPM package: https://www.npmjs.com/package/@toolz/looks-like-email. (HINT: I wrote this package.) It does exactly what the title implies: It tells me if a given value looks like an email address.
No, it's not an acid test designed to strictly filter out any potential string that could possibly be a bogus email. It doesn't try to match against all known-good TLDs. When I use this package, it's entirely possible that someone may still enter an invalid email address. And you know what? In most cases, I couldn't care less.
Because, if someone manages to sneak an invalid email address past my @toolz/looks-like-email
package, they're usually just hurting themselves. For most systems that I build/maintain, an "invalid" email address will simply mean that they don't get the notices they might otherwise expect to receive. But those edge cases would only occur if someone's trying, very hard, to find an invalid address that will pass my filter. And if they're trying that hard to subvert the filter - I don't care. Let them.
There are also some times when you may not need to validate an email field at all. (Or, at a maximum, simply validate that some value's been entered.) We've all seen (or worked on) sites where you must verify your email address before the app will allow you to do anything meaningful. In those kinda scenarios, is it really a tragedy if someone puts BS data in the email address field? The only result will be that they won't be able to properly verify their account (and begin using the app) until they do enter a valid email.
Of course, there are many other form elements that can suffer from being overly strict. For example, I once worked at a company where the user was expected to enter first name and last name values. In the interest of trying to provide "complete" frontend validation, someone set those fields to be invalid if either one contained less-than-three characters. You probably know where this is going...
Although it's fairly uncommon in the US to have a first-or-last name that consists of fewer than three characters, those names do exist. In particular, there are many people, especially those of Asian descent, who have first-or-last names that consist of only two characters. Once the app went live, we immediately started receiving complaints that some people could not complete the online form.
I understand that we commonly set first-and-last name fields as being required - meaning that they must contain some type of value. But if someone only puts, say, their first initial in the first-name field, is that really hurting anyone? Is it really gonna crash the system? Or is it just an overly-fastidious frontend developer deciding - on their own - that every user must enter first/last name values that are over a certain length?
Another good example is phone fields. I've seen an increasing number of phone fields that try to tightly restrict the purely-numeric values that you can enter. But what happens if you don't have a direct phone line? What happens if your phone number looks like this:
+1 904-555-1234 ext. 42
Yes, I have seen some online forms that provide a separate field for Extension. But most don't. So if the only way to reach this person is by entering an extension, and you don't let them enter an extension, you're forcing them to only enter the "main" number - which might connect the caller to the company's main line - and the person who answers may not even know how to forward the call to the user who completed the online form.
Here's another example of user input for a phone field that many online forms will try to block:
+1 904-555-1234 (ONLY BEFORE 5PM)
In this example, the user is trying to tell you, in no uncertain terms, that you should not try to call this number after 5PM. But if you've already decided, in your all-knowing form-developer mode, that no one should ever be allowed to enter alphanumeric characters in the Phone field, you're denying the user the ability to provide these sorts of valuable instructions.
Don't be cute
The main lesson here is: Don't be cute. Yes, you should strive to provide useful form validation. But if you're patting yourself on the back because you're certain that you've blocked every conceivable edge case... there's a good chance that you've also blocked some valid input. And when you block valid input, you run a severe risk of alienating your users.
Also, try to be realistic about just what risks exist if someone enters "bad" data in a given field. Sure, you may believe that a phone number should only ever be numeric. But is it really hurting anything if you allow letters in that field and store it on the backend as an alphanumeric?
Posted on March 11, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.