What Nobody Tells You: Must-Knows About REST as a Developer

Originally posted @ hashnode.

I see this topic flourishing around, and while one was close to hitting an important nail, none are really teaching you the necessary knowledge you, as a developer, must understand in order to manufacture a proper REST API server.

Just like Model Features, this is something you won't find anywhere else. This is my recollection of REST knowledge that experience has granted me while architecting more than a few REST services so far.

Prologue

Yes, the architectural constraints you've seen are part of the REST definition and it is good to be aware of them. But all that theory is just that: Theory. Did you know that the definition of REST doesn't mention HTTP? REST could be developed as an RPC server if you wanted. It talks about hyperlinks and hypertext, but doesn't say "REST must be implemented using the HTTP communication protocol".

Recently I read an article that implied REST is a protocol just like SOAP. Nothing can be further from the truth. You could do REST with SOAP if you wanted. SOAP is always POST'ed, so you would identify the action you want to take inside the SOAP message. But I digress. The point here is that REST and SOAP are completely unrelated things.

REST is NOT a protocol, no matter how much you see this "protocol" word attached to it. REST is just architectural constraints (or requisites).

Of the REST requisites, the following are of more relevance to you, the developer:

Stateless
Cacheable

So let's start with what really matters about REST to developers.

Resources

Wait, what? That's not in the prologue. That's right. It's not. This is mentioned almost casually by several blogs/articles and nobody pays attention. As it turns out, resources are key to the developer.

A resource is an entity available in your REST server. Examples of resources are:

The defined application users
Country information
Documents
Sales records

Pretty much, resources are the data the server serves.

REST indicates that every resource must be uniquely identified by a URI (Uniform Resource Identifier). Feel free to read about URI's all you want. Learn about the scheme, the authority and whatnot. Bottomline and long story short: URL's (Uniform Resource Locator) satisfy the definition of what a URI is.

I do not wish to look this up, but somewhere years ago I saw a purist REST article that stated that REST HTTP server URL's should be of the form http://api.example.com/?<URI>, but because URL's satisfy the URI definition, it is OK to do it as 99.999% of people do.

So one of the first tasks for you, the developer, is to define the resources and assign unique URI's to each one of them. Like this:

Now when you combine the URI's with the various HTTP verbs (POST, PUT, DELETE, GET, PATCH) you obtain a REST-compliant HTTP server.

Creating More Elaborate URI's

Ok, the examples above are rather simple. Truth be told, real-world scenarios require more versatility, especially in the realm of master-detail data, where one resource can have resources of other types associated with it in a one-to-many relationship, for example.

By using the User and Country resources as a subject of study, let's say you can have many users associated with a single country. REST purists will tell you that there should be one and only one URI per resource. If you were to follow this strict point of view, you would have to rely on the query string to find out which users belong to a given country: http://api.example.com/users?country=http%3A%2F%2Fapi.example.com%2Fcountries%2F123. That would be even using the full URI of the country of interest, URL-encoded of course. But that's too much. Let's use this one: http://api.example.com/users?country=123.

That's not too bad, I guess. In practice, however, it's awkward to implement. Imagine having to program NodeJS Express routers based on query strings; or .Net controllers that share the same route and distinguishing by query strings. Madness.

So many people prefer to overthrow the one-and-only-one-URI-per-resource rule and create more meaningful alternate routes that implicitly provide the query.

To continue with the example, what do you think about this one? http://api.example.com/country/123/users. Personally, I love it. I do all my REST like this.

I must warn you, though, that not all the awkwardness goes away, but it is a better choice in practical terms, which is the aspect of REST I want to teach with this article: Practicality.

Stateless Constraint

Nowadays, it is rather simple to fulfill this one: By using a JSON Web Token, your server is largely relieved from any need for per-user state, such as sessions. Still, in practice, it is sometimes unavoidable to have some form of state. While I have no authority to say "don't worry about it" and all will be fine and forgiven, I do that: I don't worry about it. If I must, then I must.

Cacheable Constraint

There isn't much to say about this. In this modern day and age, you may relay this task to your reverse proxy and forget about it. If you, however, lack a reverse proxy or any other piece that may satisfy this for you, then you'll probably have to do this yourself using whatever tools are available to you and make sense for your system architecture.

Having said that: Remember that web browsers have their own private cache, and if an HTTP response said it is cacheable, the web browser will do its best to respond with the cached version. All you have to do is set some headers in your HTTP responses and you're golden.

The entire cache topic is rather large, and I have never personally had to implement it in any way or form, except maybe setting the headers such as the Vary header.

If you would like to read about caching, maybe you can start here.

HTTP Verbs, Response Codes And More Practical Wisdom

Ok, now comes the really interesting part for developers!

This is also something that isn't written in stone, and at the end of the day, you return what makes sense to the consumer of your resources. Still, it is not impossible to follow, so do your best.

HTTP GET

GET is used to retrieve resources. These are the HTTP responses that you commonly see in RESTful implementations for GET requests.

Example	HTTP Status Code	Notes
http://api.example.com/users	200 OK	At least one user is returned in the response. This queries for the entire collection.
http://api.example.com/users	204 NO CONTENT	The `users` collection is empty.
http://api.example.com/users/123	200 OK	The user represented by the URI (`http://api.example.com/users/123`) exists and has been returned in the response.
http://api.example.com/users/123	404 NOT FOUND	There is no user that matches the specified URI.
http://api.example.com/users/123	410 GONE	A special case instead of 404. If the `User` resource can be soft-deleted, you may deny its `GET` operation with this status code. In practice, though, there's always a need for soft-deleted data somewhere, so think hard before using this.
http://api.example.com/users/abc	400 BAD REQUEST	Optional. Maybe users are only enumerated, so "abc" is invalid. But maybe users CAN be retrieved by username, so maybe a 400 doesn't apply? It all depends on the implementation.
http://api.example.com/users/?active=true	204 NO CONTENT	The search yielded no results (no active users found). Don't use 404 for this case because 404 is an error code, and there's nothing wrong with finding nothing every now and then.
http://api.example.com/users/?active=true	200 OK	At least one user satisfied the query condition and has been returned in the response.

HTTP PUT

PUT is used to add a new resource or to replace the existing resource with a new one. Does this sound strange to you? Let's rephrase: PUT'ing a resource to an URL (URI) stores said resource as a new resource if the specified URL (URI) is not in use by any other resource. If it is in use, however, the PUT'ed resource will take its place.

In Practical Terms

It is an UPSERT operation. But wait a minute: How can a consumer of the REST HTTP server know which ID to use, since in practice, 99.9999% of the time the ID is given by an auto-incrementing numeric field in some relational database, and therefore cannot be known ahead of time? The plot thickens!

Ha!, no, not really. This just means that a resource that lives in a REST server can only be UPSERT'ed if it contains an alternative key. An alternative key is any other piece of information about the resource that is unique amongst its peers. Examples would be the user's social or DNI number of a person, or a container's assigned serial number. Resources that can be identified by alternative keys can be PUT'ed.

This means that we will ignore once more the one-URI-per-resource rule and allow URI's for UPSERT'able resources using the alternative key: HTTP PUT http://api.example.com/users/webJose with the request body containing the details. Whether user webJose exists or not has no relevance to the outcome: A user whose username is webJose will exist from now onwards (assuming all data validation checks pass).

Back to the typical HTTP status codes returned by PUT.

Example	HTTP Status Code	Notes
http://api.example.com/users/webJose	200 OK	The resource existed and was updated.
http://api.example.com/users/webJose	201 CREATED	The resource did not exist and was created. The HTTP response will carry the `Location` header containing the new URI (`http://api.example.com/users/23`).
http://api.example.com/users/webJose	400 BAD REQUEST	Either the URI, the body payload or the request headers are incorrect.
http://api.example.com/users/webJose	409 CONFLICT	The data in the request conflicts or somehow contradicts what is expected. Typically used with timestamp verification (a. k. a. `rowversion` in SQL Server).

HTTP POST

POST inserts a new resource. The URI one specifies for this operation is the resource's parent URI (the collection URI).

Example	HTTP Status Code	Notes
http://api.example.com/users	201 CREATED	The resource was created. The HTTP response will carry the `Location` header containing the new URI (`http://api.example.com/users/23`).
http://api.example.com/users	400 BAD REQUEST	Either the URI, the body payload or the request headers are incorrect.

POST'ing is traditionally used to post a single resource, but REST doesn't really impose any requirements around this. The problem here to resolve as a developer is: How do you respond to a bulk request? HTTP headers are limited in size. Thinking that you can fit thousands of URI's in the HTTP Location header is not realistic. It is therefore more than likely that you'll have to drop the 201 CREATED HTTP response status code and use the 200 OK status code and then transmit the new URI's in the response body.

Still, I don't like this because of something I personally do that has proven very helpful. I will talk about this at the end of the article.

HTTP PATCH

PATCH is used to update a resource, and is probably the second simplest HTTP verb to understand. If you need to make changes to a resource, you send an HTTP PATCH request using the resource's URL (URI) and the information that changes.

Practicality Of "that changes"

I hear you: There's always a catch, and the catch for PATCH is those two highlighted words.

The theory states that patching does not require the full resource in the request. Patching should work by only receiving the pieces of the resource that change. This sounds nice but in practice is painful to implement.

For example, to implement this behavior in ASP.Net you will have to define the resource model twice: A model that represents the resource, and a model that is used to transmit patch information.

Call me crazy, or call me lazy, I don't care. I hate the idea of modeling a resource twice. Unless you are absolutely against the wall on this one, just require the entire resource, changes included and then validate the resource's data that is allowed to change. Then make sure your repository ignores values on fields that are not allowed to change.

For example, people forget that the resource's ID will be in two places: The URI and the body payload. What I do here is make sure the deserialized body payload ID matches the ID in the URI. If it is not the case, I return 400 BAD REQUEST, or I simply override and continue.

NOTE: This is super simple to achieve with Dapper in ASP.Net. With Entity Framework you'll have to first query for the resource, then apply the changes in the returned object for the properties that are allowed to change, and then save the changes. One more reason to hate EF: It costs you a round trip to the database just for it to learn what you already knew.

Ok, let's move to the typical HTTP responses table.

Example	HTTP Status Code	Notes
http://api.example.com/users	405 METHOD NOT ALLOWED	Collections are not `PATCH`'able.
http://api.example.com/users/123	200 OK	The resource has been updated.
http://api.example.com/users/123	404 BAD REQUEST	Either the URI, the body payload or the request headers are incorrect.
http://api.example.com/users/123	409 CONFLICT	The data in the request conflicts or somehow contradicts what is expected. Typically used with timestamp verification (a. k. a. `rowversion` in SQL Server).

HTTP DELETE

DELETE is used to delete a resource. Shocking, I know. This HTTP verb, while capable of carrying a body, it is largely unneeded. Right now I cannot remember a single instance where I needed to send body information during a deletion request. At most, you need to send the known timestamp (rowversion) and this can be transmitted easily by using the query string.

Example	HTTP Status Code	Notes
http://api.example.com/users	200 OK	Deletes the entire users collection.
http://api.example.com/users/123	200 OK	Deletes the single user associated with the URL (URI).
http://api.example.com/users/webJose	200 OK	Deletes the single user associated with the URL (URI).
http://api.example.com/users/123	405 METHOD NOT ALLOWED	A user cannot be deleted, and this is something you as developer may enforce for resources where business rules forbid deletion.
http://api.example.com/users/123	409 CONFLICT	The data in the request conflicts or somehow contradicts what is expected. Typically used with timestamp verification (a. k. a. `rowversion` in SQL Server).

This HTTP verb may also be used for soft deletions. The consumer of your REST server doesn't have to know you soft-deleted. Neither REST nor the HTTP specification forces you to reveal this implementation aspect.

Some More HTTP Responses

On top of what has been specified so far, you may also make use of other HTTP status codes.

HTTP Status Code	Notes
202 ACCEPTED	The HTTP request has been received and this response simply acknowledges this fact. Whether or not it succeeds is unknown. Typically used for fast-processing endpoints where the result is not immediately needed, such as a log-receiving microservice. The response may carry an identifier to later query for the status of the request.
401 UNAUTHORIZED	The request did not carry any recognizable authentication information for the requested operation. An authentication method exists.
403 FORBIDDEN	The request carries proper credentials but said credentials don't grant the necessary permissions to perform the requested operation, or a suitable authentication method does not exist.
418 I'M A TEAPOT	An April Fool's joke that made it to the HTTP standard. Use it if you don't want to serve the request for whatever (superfluous or petty) reason.
429 TOO MANY REQUESTS	Usually provided by throttling middleware and makes sure your HTTP server is not overwhelmed with too many HTTP requests. Requests that exceed the threshold receive this error.
500 INTERNAL SERVER ERROR	Return this if an unhandled exception occurs during the processing of a request.
503 SERVICE UNAVAILABLE	Especially useful in Microservices where the queried microservice has emptied its data store in response to a data replay request. While the data is being replayed by the Record Of Origin, the microservice returns 503 for all received requests until the data replay is finished. Do send meaningful explanations to the caller about the nature of the unavailability.

What Was the Thing I Personally Do That Has Proven Very Helpful?

So while explaining the possibility of allowing bulk operations I said there's only one practical way to respond successfully: 200 OK with the new resource URI's in the response body. I also said, however, that I don't like this.

I like to always return the updated version of a resource after data-altering operation (POST, PUT, PATCH and DELETE). I do this because there's always some information that gets updated that the requestor doesn't know about. Such as? Some examples are:

The ID of a newly created resource
The last modified date
The new resource's timestamp (rowversion)
The last modified by field

This has proven useful because most likely there's a user interface behind the request that needs to refresh its view. Returning the updated resource saves one round trip to the API server.

Tips for Bulk Operations

There are two ways you can program bulk operations:

Synchronously, only returning a response after all resources have been processed.
Asynchronously, queueing the resource-altering tasks and responding with 202 ACCEPTED.

The first one has nothing special: Simply process and return 200 OK or whatever result is relevant.

For the second one, consider adding a unique operation identifier the requestor can later use to obtain the request's result.

Conclusion

REST is a very general, even abstract concept that must not be confused with a protocol or thought of as being the same as HTTP (the P stands for "protocol" in any case). I think the best resource to start reading about what REST is, is this one.

I'll finish by quoting the above resource:

Roy Fielding (the author of REST), in his dissertation, has nowhere mentioned any implementation direction – including any protocol preference or even HTTP.

If you want to read Roy's dissertation, go here.

That's it for today, happy coding!