URL Shortener: Java & Spring complete tutorial

Introduction

For more posts like this, follow me on Twitter

A URL shortener is a service that is used to create short links from very long URLs. Usually, short links have the size of one third or even one-fourth of the original URL, which makes them easier to type, present, or tweet. Clicking on a short link user will be automatically redirected to the original URL.

There are many URL shortening services available online, like tiny.cc, bitly.com, cutt.ly, etc. Implementing a URL shortening service is not a complex task, and it is often part of system design interviews. In this post, I will try to explain the process of implementing the service.

Theory

Before implementation, it is always a good idea to write down what it is needed to be done in the form of functional and non-functional requirements.

Functional requirements:

Users need to be able to enter a long URL. Our service should save that URL and generate a short link.
Users should have the option to enter the expiration date. After that date passed, the short link should be invalid.
Clicking on the short link should redirect the user to the original long URL.
Users should create an account to use the service. Service can have a usage limit per user*
User is allowed to create his own short link*
Service should have metrics, for example, most visited links*

Non-functional requirements:

Service should be up and running 100% of the time
Redirecting should not last longer than two seconds

*Requirements are optional

Url conversion:

Let's say that we want to have a short link with a maximum length of 7. The most important thing in a URL shortener is the conversion algorithm. URL conversion can be implemented in several different ways, and each way has its pros and cons.

One way of generating short links would be hashing the original url with some hash function (for example MD5 or SHA-2). When using a hash function it is sure that different inputs will result in different outputs. The result of the hash is longer than seven characters, so we would need to take the first seven characters. But, in this case, there could be a collision because the first seven characters could already be in use as a short link. Then, we take the next seven characters, until we find a short link that is not used.

The second way of generating a short link is by using UUIDs. The probability that a UUID will be duplicated is not zero, but it is close enough to zero to be negligible. Since a UUID has 36 characters, that means that we have the same problem as above. We should take the first seven characters and check if that combination is already in use.

Third way would be converting numbers from base 10 to base 62. A base is a number of digits or characters that can be used to represent a particular number. Base 10 are digits [0-9] which we use in everyday life and base 62 are [0-9][a-z][A-Z]. This means that, for example, number in base 10 with four digits would be the same number in base 62 but with two characters.

Using base 62 in url conversion with a maximum length of seven characters allows us to have 62^7 unique values for short links.

So how base 62 conversion works?

We have a base 10 number that we want to convert to base 62. We are going to use the following algorithm:

    while(number > 0)
    remainder = number % 62
    number = number / 62
    attach remainder to start of result collection

After that, we just need to map numbers from the result collection to the base 62 Alphabet = [0,1,2,...,a,b,c...,A,B,C,...].

Let's see how this works with a real example. In this example, let's convert 1000 from base 10 to base 62.

    1st iteration:
        number = 1000
        remainder = 1000 % 62 = 8
        number = 1000 / 62 = 16
        result list = [8]
    2nd iteration:
        number = 16
        remainder = 16 % 62 = 16
        number = 16 / 62 = 0
        result list = [16,8]
        There is no more iterations since number = 0 after 2nd iteration

Mapping [16,8] to base 62 would be g8. This means that 1000base10 = g8base62.

Converting from base 62 to base 10 is also simple:

    i = 0
    while(i < inputString lenght)
        counter = i + 1
        mapped = base62alphabet.indexOf(inputString[i]) // map character to number based on its index in alphabet
        result = result + mapped * 62^(inputString lenght - counter)
        i++

Real example:

    inputString = g8
    inputString length = 2
    i = 0
    result = 0
    1st iteration
        counter = 1
        mapped = 16 // index of g in base62alphabet is 16
        result = 0 + 16 * 62^1 = 992
    2nd iteration
        counter = 2
        mapped = 8 // index of 8 in base62alphabet is 8
        result = 992 + 8 * 62^1 = 1000

Implementation

Note: The whole solution is on my Github. I implemented this service using Spring boot and MySQL.

We are going to use our database's auto-increment feature. The auto-incrementing number is going to be used for base 62 conversion. You can use any other database that has an auto-increment feature.

First, visit Spring initializr and select Spring Web and MySql Driver. After that click on Generate button and download zip file. Unzip the file and open the project in your favorite IDE.
Every time I start a new project, I like to create some folders to logically divide my code. My folders in this case are controller, entity, service, repository, dto, and config.

Inside the entity folder, let's create a Url.java class with four attributes: id, longUrl, createdDate, expiresDate.

Notice that there is no short link attribute. We won't save short links. We are going to convert the id attribute from base 10 to base 62 every time there is a GET request. This way, we are saving space in our database.

The LongUrl attribute is the URL we should redirect to once a user accesses a short link. The created date is just to see when the longUrl is saved (it is not important) and expiresDate is there if a user wants to make a short link unavailable after some time.

Next, let's created a BaseService.java in the service folder. BaseService contains methods to convert from base 10 to base 62 and vice versa.

    private static final String allowedString = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    private char[] allowedCharacters = allowedString.toCharArray();
    private int base = allowedCharacters.length;

Like I mentioned before, if we want to use base 62 conversions, we need to have a base 62 alphabet, which in this case, is called allowedCharacters. Also, the value of the base variable is calculated from the length of the allowed characters in case we want to change the allowed characters.

The encode method takes a number as input and returns a short link. The decode method takes a string (short link) as an input and returns a number. The algorithms should be implemented as they were explained above.

After that, inside the repository folder, let's create UrlRepository.java file, which is just an extension of JpaRepository and it gives us a lot of methods like 'findById', 'save', etc. We don't need to add anything else to this.

Then, let's create a UrlController.java file in the controller folder. The controller should have one POST method for creating short links and one GET method for redirecting to the original URL.

    @PostMapping("create-short")
    public String convertToShortUrl(@RequestBody UrlLongRequest request) {
        return urlService.convertToShortUrl(request);
    }

    @GetMapping(value = "{shortUrl}")
    public ResponseEntity<Void> getAndRedirect(@PathVariable String shortUrl) {
        var url = urlService.getOriginalUrl(shortUrl);
        return ResponseEntity.status(HttpStatus.FOUND)
        .location(URI.create(url))
        .build();
    }

POST method has a UrlLongRequest as its request body. It is just class with longUrl and expiresDate attributes.

The GET method takes a short url as a path variable and then gets and redirects to the original url.
At the top of the controller, UrlService is injected as a dependency, which will be explained next.

UrlService.java is where most logic is and is the service used by the controller. ConvertToShortUrl is used by the POST method from the controller. It just creates a new record in the database and gets an id. That id is then converted to a base 62 short link and returned to the controller.

GetOriginalUrl is a method used by the GET method from the controller. It first converts a string to base 10, and the result of that is an id. Then, it gets a record from the database by that id and throws an exception if it does not exist. After that, it returns the original URL to the controller.

And that is it for part one. In the next part I will focus on some more 'advanced' stuff.

'Advanced' topics

In this part, I will talk about Swagger documentation, dockerization of application, application cache and MySql scheduled event.

Swagger UI

Every time you develop an API, it is good to document it in some way. Documentation makes APIs easier to understand and use. The API in this project is documented using Swagger UI.

Swagger UI allows anyone to visualize and interact with the API’s resources without having any of the implementation logic in place. It’s automatically generated, with the visual documentation making it easy for back end implementation and client-side consumption.

There are several steps that we need to do to include Swagger UI in the project.

First, we need to add Maven dependencies to the pom.xml file:

    <dependency>
        <groupId>io.springfox</groupId>
        <artifactId>springfox-swagger2</artifactId>
        <version>2.9.2</version>
    </dependency>
    <dependency>
        <groupId>io.springfox</groupId>
        <artifactId>springfox-swagger-ui</artifactId>
        <version>2.9.2</version>
    </dependency>

For reference, you can see the complete pom.xml file here.
After adding the Maven dependencies, it is time to add Swagger configuration.
Inside the config folder, we need to create a new class - SwaggerConfig.java

    @Configuration
    @EnableSwagger2
    public class SwaggerConfig {

    @Bean    
    public Docket apiDocket() {   
        return new Docket(DocumentationType.SWAGGER_2)  
            .apiInfo(metadata())    
            .select()    
            .apis(RequestHandlerSelectors.basePackage("com.amarin"))    
            .build();    
    }

    private ApiInfo metadata(){
        return new ApiInfoBuilder()
        .title("Url shortener API")    
        .description("API reference for developers")    
        .version("1.0")    
        .build();    
        }  
    }

At the top of the class, we need to add a couple of annotations.

@Configuration indicates that a class declares one or more @beans methods and may be processed by the Spring container to generate bean definitions and service requests for those beans at runtime.

@EnableSwagger2 indicates that Swagger support should be enabled.

Next, we should add Docket bean which provides the primary API configuration with sensible defaults and convenience methods for configuration.

The apiInfo() method takes the ApiInfo object where we can configure all necessary API information - otherwise, it uses some default values. To make code cleaner, we should make a private method that will configure and return the ApiInfo object and pass that method as a parameter the of apiInfo() method. In this case it is the metadata() method.

The apis() method allows us to filter packages that are being documented.

Now Swagger UI is configured and we can start documenting our API. Inside UrlController, above every endpoint, we can use @ApiOperation annotation to add description. Depending on your needs you can use some other annotations.

It is also possible to document DTOs and using @ApiModelProperty which allows you to add allowed values, descriptions, etc.

Caching

According to Wikipedia, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere.

The most frequently used type of cache is in-memory cache which stores cached data in RAM. When data is requested and found in the cache, it is served from RAM instead of from a database. This way, we avoid calling costly backend when a user requests data.

A URL shortener is a type of application that has more read requests than write requests which means it is an ideal application to use cache.

To enable caching in Spring Boot application, we just need to add @EnableCaching annotation in UrlShortenerApiApplication class.

After that, in the controller we need to set the @Cachable annotation above GET method. This annotation automatically stores the result of the method call to the cache. In the @Cachable annotation, we set the value parameter which is the name of the cache and the key parameter which is the cache key. In this case for the cache key, we are going to use 'shortUrl' because we are sure it is unique. Sync parameter is set to true to ensure only a single thread is building the cache value.

And that is it - our cache is set and when we first load the URL with some short link, the result will be saved to cache and any additional call to the endpoint with the same short link will retrieve the result from the cache instead of from the database.

Dockerization

Dockerization is the process of packaging an application and its dependencies in a Docker container. Once we configure Docker container, we can easily run the application on any server or computer that supports Docker.

The first thing we need to do is to create a Dockerfile. A Dockerfile is a text file that contains all the commands a user could call on the command line to assemble an image.

    FROM openjdk:13-jdk-alpine   
    COPY ./target/url-shortener-api-0.0.1-SNAPSHOT.jar /usr/src/app/url-shortener-api-0.0.1-SNAPSHOT.jar    
    EXPOSE 8080    
    ENTRYPOINT ["java","-jar","/usr/src/app/url-shortener-api-0.0.1-SNAPSHOT.jar"]

FROM - This is where we set a base image for the build base. We are going to use OpenJDK v13 which is a free and open-source version of Java. You can find other images for your base image at Docker hub which is a place for sharing docker images.

COPY - This command copies files from the local filesystem (your computer) to the filesystem of the container at the path we specified. So we are going to copy the JAR file from the target folder to /usr/src/app folder in the container. I will explain creating the JAR file a bit later.

EXPOSE - Instruction that informs Docker that the container listens on the specified network ports at runtime. The default protocol is TCP and you can specify if you want to use UDP.

ENTRYPOINT - This instruction allows you to configure a container that will run as an executable. Here we need to specify how Docker will run out an application. The command to run an application from the .jar file is

    java -jar <app_name>.jar

so we put that 3 words in an array and that is it.

Now when we have Dockerfile we should build the image from it. But like I mentioned before, we first need to create .jar file from our project so the COPY command in Dockerfile can work properly. To create executable .jar we are going to use maven. We need to make sure we have Maven inside our pom.xml. If Maven is missing, we can add it

<build>    
    <plugins>    
        <plugin>    
            <groupId>org.springframework.boot</groupId>    
            <artifactId>spring-boot-maven-plugin</artifactId>    
        </plugin>    
    </plugins>    
</build>

After that, we should just run the command

    mvn clean package

After that is done, we can build a Docker image. We need to make sure we are in the same folder where a Dockerfile is so we can run this command

    docker build -t url-shortener:latest .

-t is used to tag an image. In our case, that means that the name of the repository will be url-shortener and a tag will be the latest. Tagging is used for the versioning of images. After that command is done, we can make sure we created an image with the command

    docker images

That will give us something like this

For the last step, we should build our images. I say images because we will also run MySQL server in a docker container. Database container will be isolated from the application container. To run MySQL server in docker container simply run

    $ docker run --name shortener -e MYSQL_ROOT_PASSWORD=my-secret-pw -d -p 3306:3306 mysql:8

You can see the documentation on Docker hub.

When we have a database running inside a container, we need to configure our application to connect to that MySQL server. Inside application.properties set spring.datasource.url to connect to the 'shortener' container.

Because we made some changes in our project it is required to pack our project into a .jar file using Maven and build the Docker image from the Dockerfile again.

Now that we have a Docker image, we should run our container. We do that with the command

    docker run -d --name url-shortener-api -p 8080:8080 --link shortener url-shortener

-d means that a Docker container runs in the background of your terminal.
--name lets you set the name of your container

-p host-port:docker-port - This is simply mapping ports on your local computer to ports inside container. In this case, we exposed port 8080 inside a container and decided to map it to our local port 8080

--link with this we link our application container with database container to allow containers to discover each other and securely transfer information about one container to another container. It is important to know that this flag is now legacy and it will be removed in the near future. Instead of links, we would need to create a network to facilitate communication between two containers.

url-shortener - is the name of the docker image we want to run.

And with this, we are done - in browser visit http://localhost:8080/swagger-ui.html

Now you can publish your image to DockerHub and easily run your application on any computer and server.

There are two more things I want to talk about to improve our Docker experience. One is multi-stage build and the other is docker-compose.

Multi-stage build

With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.

Multi-stage builds are good for us to avoid manually creating .jar files every time we make some change in our code. With multi-stage builds, we can define one stage of the build that will do the Maven package command and the other stage will copy the result from the first build to the filesystem of a Docker container.

You can see the complete Dockerfile here.

Docker-compose

Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.

With docker-compose, we will pack our application and database into single configuration file and then run everything at once. This way we avoid running MySQL container and then linking it to the application container every time.

Docker-compose.yml is pretty much self-explanatory - first we configure MySQL container by setting image mysql v8.0 and credentials for the MySSQL server. After that, we configure the application container by setting build parameters because we need to build an image instead of pulling it like we did with MySQL. Also, we need to set that application container depends on the MySQL container.

Now we can run the whole project with only one command:
docker-compose up

MySql scheduled event

This part is optional but I think somebody might find this useful anyway. I talked about the expiration date of the short link which can be user-defined or some default value. For this problem, we can set a scheduled event inside our database. This event would run every x minutes and would delete any row from the database where the expiration date is lower than the current time. Simple as that. This works well on a small amount of data in the database.

Now I need to warn you about a couple of issues with this solution.

First - This event will remove records from the database but it will not remove data from the cache. Like we said before, the cache will not look inside the database if it can find matching data there. So even if data no longer exists in the database because we deleted it, we can still get it from the cache.

Second - In my example script I set that event runs every 2 minutes. If our database becomes huge then it could happen that event does not finish execution within its scheduling interval, the result may be multiple instances of the event executing simultaneously.

Blog