How to Build HTML Forms Right: Security

austingil

Austin Gil

Posted on December 3, 2020

How to Build HTML Forms Right: Security

This is the last article in a series covering all the various aspects of creating forms for the web. Each article can be read independently, but I wrote them in the order that makes the most sense. If you have not read the others, I would encourage you to take a look.

This final article in the series is arguably the most important. It covers security. While the other articles were mainly focused on the frontend, security goes beyond that. We have to consider the current user, other users, and our own security. As such, we will look at the whole application architecture from frontend to backend and beyond.

Encrypt Traffic (SSL)

Before we get too far, I will be using the term “SSL” to refer to a technology used to encrypt traffic on the internet. Technically, I mean Transport Layer Security (TLS), but “SSL” is commonly used and understood to mean the same thing. It’s what gives websites the little green lock in the URL bar and why they start with “http*s*” instead of “http” (no “s”).

Installing an SSL certificate is a best practice for several reason, security being the most important. Having an SSL certificate lets you encrypt the data passed between the client (your user) and the server (you).

A hacker snooping on a network may inspect the packets of data a user sends. Without an SSL certificate, that data may be easily read as plain text. With an SSL certificate, the data can still be intercepted, but it would be sent as encrypted text which is pretty much useless.

  • Without an SSL certificate: username:

    NuggetTheMighty; password: ILoveSquirrels

  • With an SSL certificate (encrypted):
    SIUFJaYQNtsn+y73mfBYv3fVfjJ2GdHl4A7XnTJXxgUyd4/TrU3nN+g3aJ4BVXSJL/E7

This is especially important when creating forms because the whole point of a form is to send data. We owe it to our users to protect their data.

Getting and installing an SSL certificate used to cost time, money, and some technical know-how. Today, many hosting services will provide and install a certificate for you for free. In many cases, they even do so automatically.

If your hosting service does not provide SSL certificates, there are other options:

  • Cloudflare provides a “flexible” SSL through their DNS. It’s free and easy, but probably better to have your own.
  • If your site uses WordPress, there are a few plugins that will set up a certificate in under a minute, via Let’s Encrypt.
  • You can use Certbot to generate and install a certificate for you via Let’s Encrypt.

If you still don’t have an SSL certificate because your host doesn’t provide one, and you don’t have control of the DNS or server code…well, you’re kind of out of luck. You’re going to have to switch hosts, or talk to your server admin, or do something because this should be a hard requirement for any project these days.

Understand GET vs. POST

In a previous article, I said you should always include the method attribute on your forms. The method attribute tells the form to submit data with the GET or POST HTTP header on the request. If you omit the method, the browser will default to using the GET method. This is important because there are significant differences between GET and POST requests.

GET Request

Take a look at the following form using a GET method. When you submit the form, the data will be submitted to example.com (spoiler alert, it doesn’t actually do anything).

<form action="https://example.com" method="GET" target="_blank">
  <label for="data">Send some sweet data</label>
  <input id="data" name="some-sweet-data"/>
  <button>Submit</button>
</form>
Enter fullscreen mode Exit fullscreen mode

The key thing to notice is the URL after you submit the form. Although the form’s action is “example.com”, the submission URL is “example.com?some-sweet-data=blahblahblah”. Those query parameters correspond to the form inputs name attributes. This is how forms using the GET method transfer data; query string parameters.

Passing data as a URL parameter is relevant to security for a couple of reasons:

  1. For many users, URL’s get saved in the browser’s history. Consider if form was sending a credit card number by query parameter, and the user was on a public computer, like at a library. Their private data could end up in the browser’s history for the next user to see.
  2. Many servers keep logs of the URL’s that received traffic. If sensitive information ends up in server log files, anyone with access to those files could see the data.

POST Request

Fortunately, you can use the POST method to send data without using query parameters. Let’s look at the same form, but using the POST method:

<form action="https://example.com" method="POST" target="_blank">
  <label for="data">Send some sweet data</label>
  <input id="data" name="some-sweet-data"/>
  <button>Submit</button>
</form>
Enter fullscreen mode Exit fullscreen mode

Notice how submitting this form also loads example.com, but this time there is nothing in the query parameters. That’s because on a POST request, data is sent as part of the request body. This makes it more difficult to accidentally leak private information.

As a rule of thumb, I use the POST method on all forms for the reasons listed above. The few exceptions to this rule, are if I want to let users bookmark their form submission, or share it with someone else. For example, check out this form that submits search to DuckDuckGo:

<form action="https://duckduckgo.com/" method="GET" target="_blank">
  <label for="query">Search</label>
  <input id="query" name="q"/>
  <button>Submit</button>
</form>
Enter fullscreen mode Exit fullscreen mode

After the form is submitted, DuckDuckGo will open with a query parameter in the URL. Then you could, if you wanted, copy this URL and share it with a colleague, or bookmark it for later. This pattern can very useful to keep it in mind unless you’re dealing with sensitive data.

Prevent Spam

No one likes spam. And I’ll admit it’s only marginally related to security. It’s worth mentioning here because any time we add a form to a public web page, we are opening up the doors to spam. Forms are meant to be filled out, but sometimes they get filled out by someone, or some*thing*, for nefarious reasons.

So how do we prevent it?

Honeypots

One rudimentary way of preventing spam is called a “honeypot” and the concept is quite simple. If you include a hidden input in your form, you know that a real human should never modify that field. Therefore, if the form is submitted with data for that input, you can assume it was a bot and reject the submission.

In practice, here’s what that input might look like:

  • The name is important so you know what to check on the backend. I used ‘honeypot’, but most folks would recommend calling it something that sounds more legitimate.
  • I used a visually-hidden class to hide the input from users (you can read more about this in the article on accessibility or styling). Bots will still see it.
  • The tabindex="-1" removes the input from keyboard navigation. This is important for assistive technology users (more on this in the accessibility post).
  • Finally, we want to prevent the browser form filling the input automatically, so we disable autocomplete.

The good news about this approach is that it can cost very little time and effort to implement. The bad news is that many bots are smart enough to tell when an input is a honeypot and they will skip over it. But hey, even if this stops 10% of spam, the level of effort is worth it.

Security Challenge

A more robust way of preventing spam is to include a challenge that users need to complete to prove they are human. Some basic examples are inputs that ask you to complete an easy math question like “What is 10 + 6?”. Only data with the correct answer will be accepted.

The problem with this method is that, once again, bots can be sophisticated enough to solve these challenges.

The spam dilemma is a constantly evolving game of cat and mouse that has seen challenges become more complex through time. First math questions, then detecting letters or numbers in images.

Probably the most familiar security challenge is reCAPTCHA. It’s a service now owned by Google that shows users a bunch of images they need to identify. It works very well, and it’s free. If you are concerned about user-privacy, you may not want to use Google products. The good news is that there is another service called hCaptcha which is a drop-in replacement. The security challenge technique is not without downsides:

  • They are more technical to implement.
  • You may need to rely on a 3rd party service.
  • They can have a negative impact on user experience.

WAF & APIs

If spam is becoming a major issue in your forms, you may want to consider reaching for a 3rd party service.

One option is to set up a Web Application Firewall (WAF). A WAF sits in front of your server and prevents traffic from bad actors getting to your website in the first place.

Cloudflare is my preferred vendor. They work at the DNS level and have a very generous free tier. I use this on every domain I own, and so far I haven’t had any problem with spam.

Another option is to use an API service to test incoming form submissions. The most common one I know is Akismet which is part of the Automattic products (they make WordPress). I’ve used this on some WordPress sites and can say that it works well. They also have an API if you don’t use WordPress. CSS Tricks has an article that goes into more depth on 3rd party spam APIs if you’re interested in other options.

I wouldn’t trust any spam prevention technique to be 100% guaranteed. The field is ever-evolving with spammers getting more advanced every year. However, it’s also not the sort of problem I would try to solve until I have it. In which case, you can start with some of the low-hanging fruit and work your way up to a more involved solution.

Considering the level of effort, user experience, cost, and everything else, I would approach things like this:

  1. Setup Cloudflare on your DNS (or some other WAF)
  2. Use honeypots
  3. Integrate a spam detection API
  4. Setup hCaptcha (last resort due to the user experience)

Validate Data

Validation is when you enforce that the data you receive matches what you expect. For example, if I am registering a new user, I want to make sure the email they provide is actually an email address.

There are generally two places that you validate data: client-side and server-side.

Client-Side Validation

Validation on the front end is usually done with HTML attributes or with JavaScript.

For example, if we wanted an input that must be filled out as an email with a maximum length, we could implement it like so:

<form action="example.com" method="POST">
  <label for="email">Email
  <input id="email" name="email" type="email" required maxlength="20">

  <button type="submit">Submit</button>
</form>
Enter fullscreen mode Exit fullscreen mode

If a user tries to submit the form without satisfying our requirements, the browser will prevent it and show the user an error message.

If we don’t want to show the built-in validation UI, we can add the novalidate attribute to our form. This will prevent the default validation logic, and we can replace it with our own.

One approach is to use the form’s checkValidity method to see if the form has any invalid inputs. If the form is invalid, we could loop over each input and see exactly what rule is being broken with the ValidityState API:

const form = document.querySelector('form');
form.addEventListener('submit', (event) => {
  const isValid = form.checkValidity()

  if (!isValid) {
    const inputs = form.querySelectorAll('input')

    for (const input of inputs) {
      // Do some validation logic with the input
      console.log(input.validity)
    }
  }
})
Enter fullscreen mode Exit fullscreen mode

ValidityState is very handy because it will give us an object where each key/value pair represents a validation attribute and it’s validity status:

{
  badInput: Boolean
  customError: Boolean
  patternMismatch: Boolean
  rangeOverflow: Boolean
  rangeUnderflow: Boolean
  stepMismatch: Boolean
  tooLong: Boolean
  tooShort: Boolean
  typeMismatch: Boolean
  valid: Boolean
  valueMissing: Boolean
}
Enter fullscreen mode Exit fullscreen mode

This can get us pretty far. We could show specific error messages for each invalid property, or modify the class names on the input (in fact, this is how Vuetensils does validation).

I can’t make assumptions about your implementation so you’ll have to take it from here. If you need something more robust, you can use one of the many JavaScript validation libraries on NPM.

Whether it’s your own implementation or a third-party library, client-side suffers from one major flaw.

Any technical user could modify HTML validation attributes, or bypass client-side validation altogether by making an HTTP request outside of your form. This is why it’s important to never trust data from the client. Let me repeat.

Never trust data from the client!

Client-side validation should be used to improve user experience by providing immediate feedback. It should not be your only line of defense for securing your application.

Server-Side Validation

Since we cannot trust data that comes in from the client, we should always validate data on the server side. For simple applications, you can create your own validation logic, but for serious projects I recommend using a dedicated library. Libraries are great for several reasons:

  1. Validation is a solved problem. No need to reinvent the wheel.
  2. Libraries often work better than custom implementation because they have been tested on more projects.
  3. A library can future-proof validation requirements. They can provide features we don’t need now but may need later.
  4. For server-side projects, we don’t need to worry about bundle size. The impact of adding more dependencies is not as high as on the client.

My preferred validation library at the moment is Yup. I love it!

How you end up doing server-side validation is up to you. In any case, there are some important considerations to make which relate to the front-end. Consider these point when you experience a validation error on the server:

  • Respond with the appropriate HTTP status code (400 in most cases).
  • Provide some sort of clear message on what was invalid.
  • If there are many things to validate (like a JSON object), validate the entire package. Don’t throw an error immediately for the first invalid value. Respond with all the validation issues to avoid multiple requests.
  • Providing unique error codes (ie. { error: INVALID_EMAIL }) can help the front-end create their own dictionary for error messages.

Sanitize/Escape Data

Like validation, data sanitizing (also known as escaping) is a practice that belongs on the server. Sanitizing data is when you transform or remove dangerous data. It’s different than validation because you don’t reject the input. You modify it so it’s safe to use.

For example, let’s say you have a form that asks for a first name and last name. A user might enter the following:

First name: l33t; DROP TABLE user

Last name: <script>alert('h4x0r')</script>

This person is most likely a liar that should not be trusted. Also, their data could subject you to TWO types of attacks: SQL injection and cross-site scripting (XSS).

If you try adding the user’s first name to the database as is, you might drop the entire user table. Hello SQL Injection. If you save the last name as is, your database would be fine, but if you add that last name to your HTML, it could inject arbitrary JavaScript onto the page. The JavaScript in the example is harmless, but what if it transferred user secrets? Oof, XSS attack.

This series focuses on HTML forms so we won’t get into the depths of XSS or SQL Injection. For in-depth prevention, I would recommend the OWASP cheatsheet series for XSS and SQL Injection.

The point I want to focus on is we could avoid both of the scenarios above by sanitizing the data. My advice, once again, is to lean on libraries that specialize in talking to databases.

For SQL databases, I recommend using an Object-relational Mapping (ORM) library instead of writing raw SQL. Many of them automatically sanitize data. For JavaScript projects, I really like Knex.js and Objection.js.

Whenever you are adding user-generated content to HTML you must sanitize strings to avoid XSS attacks. A library I’ve used before is XSS. You can sanitize content in a few different places:

  • Before you save it to your database.
  • After you read it from your database.
  • Before you write it to an HTML document.

The safest place sanitize content is right before adding it to the HTML document. However, I like to follow a zero-trust pattern and just assume the worst-case scenario. In which case, it’s best to do all three. Call me paranoid.

It’s also worth mentioning that using a front-end framework to create your HTML documents can help as well. Many frameworks, such as Vue.js and React automatically escape content before adding it to the page, unless you explicitly tell them not to.

Handle JWTs Properly

JSON Web Tokens (JWT) is a really cool technology that was created to solve the modern challenge of sending data around to several services while avoiding the need for a centralized service to check the validity of the data.

In other words, we can store authentication details about a user inside a JWT and we can be certain that the contents of that token cannot be changed. Then we can send this token to an API, and that API does not need to check with any central database to know which user made the request. The API can simple open the JWT and see the authentication details for the user. It’s great.

Authentication is one of the main ways JWTs are used today. However, JWTs do have some significant downsides:

  • The contents of the JWT are not hidden to anyone that gains access to it.
  • JWTs can have an expiration date, but they cannot be programmatically invalidated.

For these two reasons, we should be especially thoughtful when working with JWT. Unfortunately, most of the tutorials I’ve seen instruct developers to create authentication tokens (JWT) with somewhat long expiration dates and to store save tokens in localStorage. I have issues with this.

The problem with storing sensitive data in a JWT on the client localStorage, sessionStorage, IndexedDB, etc.) is that it’s accessible to any JavaScript on the page. It may be a cross-site script as well as any script we did not write ourselves: libraries and frameworks, assets from public CDNs, 3rd party snippets, even browser extensions.

My other issue relates to token expiration. If a user with an “ADMIN” role logs into our application, they would receive an auth token that says they are an “ADMIN”. They could therefore perform “ADMIN” related actions (like create or delete other users) until the token is lost or expires. If our auth token has an expiry time a week into the future, it could theoretically take a week for any change we make to be finalized. What if the “ADMIN” role was a human error, and we actually meant to assign the “GUEST” role to this user? Do you see the problem?

This brings me to my cardinal rules for JWT handling:

  1. Any JWT that contains sensitive/private/authentication data (user ID, personal identifying information, etc) should only be stored in memory.
  2. Every JWT should have an expiration date. Any JWT used for authentication or authorization (‘auth’) should have a very short expiration (ie. 15min, 24hr, etc.).

These rules solve our security issues, but present us with a couple of user experience challenges. By only storing auth tokens in memory, the user will have to log in every time the application loads. And if our auth tokens use a 15-minute expiration, then the user will effectively be ‘logged out’ every 15 minutes.

The solution to these issues is best explained by the excellent article, “The Ultimate Guide to handling JWTs on frontend clients (GraphQL)” written by Vladimir Novick. It’s a bit complicated, but I’ll try my best to simplify things in an example:

  • You create two authentication routes. One for logging into the application (/login) and one for generating a new auth token (/refresh).
  • When a user logs in, and auth token is returned that contains any data required to authorize a request (eg. {userId: 5, role: 'admin'}). It has a short expiry (eg. 15 minutes).
  • The login response also returns a refresh token. This token only contains the information necessary to recreate a new auth token (eg, {userId: 5}). It can have a longer expiry to match how long you want a user to stay ‘logged in’ for. Let’s say a week.
  • A user logs in by sending their credentials to the login route, and in return, they get one auth token and one refresh token.
  • The auth token gets saved in memory, and the refresh token can be put in localStorage (it doesn’t usually matter if someone knows my user’s ID).
  • After login, we also set an interval for 14 minutes (less than the auth token expiry). On this interval, we send the refresh token to the /refresh route, and exchange it for a new auth token.
  • This new auth token can replace the old one, and the user remains ‘logged in’.
  • The last step is to make sure to check localStorage for existing refresh tokens any time the application starts. If there is a refresh token, we hit the /refresh route before the app loads. That way we can keep a user ‘logged in’ across multiple sessions.

This JWT login flow is quite complicated, but I hope I did it justice. To fully describe it requires a dedicated article, so I would invite you to read the article I mentioned above. It’s excellent.

Protect Against CSRF Attacks

Cross-Site Request Forgery (CSRF) attacks are a bit complicated to understand, but they work by tricking users into making a request on the attacker’s behalf. A theoretical example is probably best to explain.

Imagine your bank has a form to send money from your account to another user’s account. This form sends money by making a POST request to some endpoint such as yourbank.com/send-money with two data values:

  • to: The user ID receiving the money
  • amount: The amount you want to send (obviously).

For security reasons, this only works if you are logged in (also obviously). The server could authenticate the request via HTTP cookies.

In this hypothetical scenario, this form may be vulnerable to CSRF attacks. If an attacker knows enough about how the bank’s backend works, they could create a form disguised as a button that promises kittens.

<form action="http://example.com/send-money" method="POST">

  <input type="hidden" name="to" value="123456"/>
  <input type="hidden" name="amount" value="100"/>

  <button type="submit"/>Click for Kittens!!!</button>
</form>
Enter fullscreen mode Exit fullscreen mode

Notice how the form above takes advantage of a couple of hidden inputs with the values setting the to and amount data. To an unsuspecting user, this form will visually present as a button promising kittens (evil, I know).

If you were to click this button, it would submit the form to your bank’s /send-money endpoint, and if you’re already logged in with a valid cookie in your browser, that cookie will be sent along with the form submission. This could be enough to trick a user to send money to someone else.

It’s also worth noting that this attack could happen in a number of ways. It could exist on a random website, within an email, in a browser extension, and more. And if JavaScript is enabled, which it most likely is, it can even happen without any user interaction. So how do we protect against this?

CSRF Tokens

One way to prevent this from happening is by using “CSRF tokens”. These are unique values that are generated on the server that only the server knows about. They are provided to a form in order to be used as the value of a hidden input like this:

With the input containing the CSRF token in place, the form can be submitted, and the backend can check the validity of the token. Any form that includes a valid token can continue on the request. Any form submitted with an invalid or missing token is rejected.

If a hacker wants to create the same form as the one above, they will not be able to generate their own CSRF token (assuming you do have a way to validate the tokens).

The tricky part here is getting the CSRF token in a way no one else can. If you are creating the form on the same server, it’s easy enough to generate a token and then pop it into the HTML. If you are working with an API then you need a route that provides valid CSRF tokens. You should configure this route to only allow traffic from known domains. That way you can make a request for a token from a valid domain, but hackers will not be able to.

Validate Request Origin

A basic but clever approach to prevent CSRF attacks is to check the request’s Origin and/or Referer headers. These headers contain the URL from which the request was made.

The best thing about these headers is that they are set by the browser and cannot be programmatically modified. So no funny business. How you access these headers will depend on the technology you use. For example, if I am using Express, I can create a middleware that looks something like this:

app.use((request, response, next) => {
  const allowedHosts = new Set([request.headers.host]);
  let referer = request.headers.host;
  let origin = null;

  if (request.headers.referer) {
    referer = new URL(request.headers.referer).host;
  }
  if (request.headers.origin) {
    origin = new URL(request.headers.origin).host;
  }

  if (!allowedHosts.has((origin || referer))) {
    return next(new Error('Unallowed origin'));
  }

  next();
});
Enter fullscreen mode Exit fullscreen mode
  • Create a list of all the allowed hosts (in our case, only our same app domain is valid)
  • Check if the referer and/or origin headers are present. If so, grab their URL
  • If neither the origin nor the referer URLs are within our list of allowed hosts, we reject the request.

This snippet is good for an example, but you may need something more robust for production purposes. In any case, it can be implemented with few lines of code, which I always appreciate.

For more details on CSRF attacks, OWASP has an excellent article with more descriptions. They also have an article in their cheatsheet series with more details on preventing CSRF attacks. In fact, they are an excellent resource for anything related to security and I would highly recommend you take some time to read through their content.

For my fellow JavaScript developers out there, Auth0 has a nice article specific to Node.js development and CSRF prevention.

Secure Cookies

As mentioned above, CSRF attacks use cookies as part of their attack vector. So it makes sense that a good way to protect against cookie-based attacks is to make sure our cookies are secure.

For those unfamiliar, a cookie is an HTTP header. More specifically, cookies are assigned with the Set-Cookie header and look like this: Set-Cookie: <name>=<value>; <attributes>.

An example might look like:

Set-Cookie: sessionId=38afes7a8; Domain=example.com; Max-Age=2592000; Secure; HttpOnly; SameSite=strict;

Some of the attributes relevant to security are:

  • Expires and Max-Age: Allows you to set a time limit on the cookie’s validity.
  • Secure: Ensures that the cookie will only be sent if the request is made over a secure (HTTPS) connection. Useful for preventing man-in-the-middle attacks.
  • HttpOnly: Prevents JavaScript from having access to the cookie. Useful for preventing XSS attacks.
  • SameSite: Can be set to only send cookies if the request origin matches the target domain. Useful for preventing CSRF attacks.

These are all the attributes that I think relate to security. But as you can see, only the SameSite cookie attribute is relevant for CSRF attacks. This is a relatively recent addition to the web platform and is great news for security. However, because it’s somewhat new, it won’t be effective on older browsers.

If you want to can read more about working with cookies, I would recommend the MDN docs.

Closing Thoughts

I realize that some of the content in this post is only tangentially related to writing forms. Some of the advice here is not directly related forms at all. However, I hope you agree that it’s relevant information. We must keep these things in mind as we are writing forms for the web. Even if we are not the ones implementing these changes, we should think about our software holistically in order to keep ourselves and our users safe.

This article took about 20 hours to research and create. The best way to show me that you enjoyed it is to share it. You can also sign up for my newsletter or follow me on Twitter if you want to be the first to know when new articles come out.

And if you missed any of the other articles, please consider giving them a read. I think you’ll enjoy those too.

- Part 5: Security

This article was originally published on austingil.com.

💖 💪 🙅 🚩
austingil
Austin Gil

Posted on December 3, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

How to Build HTML Forms Right: Security
development How to Build HTML Forms Right: Security

December 3, 2020