Rajdeep Chandra
Posted on June 9, 2020
This is one of my few system design tutorials which I am gonna post in next few days where I will tell you about good concrete failsafe solutions in building highly scalable distributed systems.
A brief on what we are gonna learn:
Designing a system is always an important or The Most important factor in a high traffic system. While designing something like a UrlShortener, a ticketing application or an eCommerce app where millions of users will do reads and writes. You system should be able to handle such amount of traffic smoothly efficiently and robustly. So to do this we need to design our stack our workflows and the system at scale.
Today we are gonna build a URL shortener like tinyurl.com. We will see what all algorithms can be used, good and shortcomings of those and ultimately which suits the best for this system.
So the first checkpoint of designing a system is to make few concrete assumptions:
In this case we can assume: the length of the url which can be 7 characters long and how many hits will come to our system. Lets say our system will be capable of getting 1Million hits/day ie 30 M hits/day and so on….
Building Data Capacity Model:
So to build our database we need to build a data model which will consist of the following entries:
- Long URL - 2KB of size
- Short URL - 7 chars
- Created datestamp - 7 bytes
- Expiry date - 7 bytes
- Its quite simple right… No its actually not when you want to build a system which scales. For a single server system, its quite simple but in real life scenarios these systems should scale.
Algorithms:
Now we come to the implementation logic part. So to achieve this.. well almost we can use two algorithms:
- base62
- MD5 Hash
So both the above two algorithms can be used to get the random hashes.We are using base62 since with using base62 we can generate 62⁷ ie more than 3 trillion combinations of strings and the same as with MD5 Hash. A light problem with MD5 Hash is that it gives us 20–22 character long hash values where are requirement is only 7 characters so to achieve that we need to pickup first 7 characters of the MD5 Hash value. That is ok for the time being…
Trying to build the system with base62 and MD5 Hash:
First lets try to build the system with base62: So we take our long url for e.g: www.abc.com/ayhasd/asdeqwe/qweqw?yusdsd=123123&kqwenoow
We pass this value to our base62 service and it will return me a 7 character unique key such as an1132s which we will pass it to our url shortener domain and build our short url like www.shorturl.com/an1132s which on hitting will redirect us to the desired long url. This works nicely for a single system but suppose when millions of users are using the system there has to be system which will have parallel processing or sharding or multiple servers serving those requests instead of one server.
So in such cases there rises our problem. Since multiple servers are serving different requests there can be cases where two servers will come to a point where they will return same 7 characters base62 value will be same for two different long urls. So if we are using a NoSQL database we will not have methods like INSERT IF and search the database whether that keys exists or not but we will run into a corrupt item or a Database collision is such case.
Trying to solve the problem with a Counter:
Well the above problem can be solved with a counter..OR can it be? Let’s try it out.
To solve the ambiguity we can have a counter or 2 counters which can track or act as a thread safe unique id generators so that we don’t run into duplicate hash values. So we can give one counter a range value from 1–1Million and the second counter to 2 Million — 3 Million. But suppose the range values get exhausted after few months and the counter service will not know how to reset it and it will not have the communication with the other counter since both the counters are working in different servers. So here lies the problem with having a counter service.
Zookeeper to the Rescue:
We can have a failsafe concrete solution to this problem with ZooKeeper. But what is Zookeeper?
A Zookeeper is a configuration management distributed system which acts as a coordination service to manage different services or servers enrolled to it. In simple words it acts as a centralised hub to communicate with different nodes attach to it. Keeping this in mind let us draw a System Design Diagram to solve our scaling problem.
So, a user uploads a long url which goes through a load balancer to determine which server to send the payload to. Assuming Server 1 is free to accept a payload and when it gets the request it increments its counter which ranges from 1 million — 2 million and assigns a value to the request. The Request then goes to the base62 service and gets its new shortened 7 character key.
The main job of Zookeeper here is to assign each server with a particular counter range and keep track of its changes so that there is no duplicate keys or database collision.
Let us Scale:
Suppose say Server 3 went down after a certain time so it will remove the entry in the Zookeeper service and make it unassigned again. And let us add another server 4 to the system and it will be assigned to a new range of 4 m — 5 M. This is how we can easily scale up and down the system without any downtime. And since Zookeeper is a service with multiple servers in it so it will eventually be a failsafe system.
Link to the GitHub Repo:
Live App:
Link to Download Chrome Extension:
A full E2E application to shorten your urls and save it on your profile. It gives you flexibity to choose your domain…
Thanks for reading! If you have any questions, feel free to reach out at rajrock38@gmail.com, connect with me on LinkedIn, or follow me on Medium and Twitter.
If you found this article helpful, it would mean a lot if you gave it some applause👏 and shared to help others find it! And feel free to leave a comment below.
Posted on June 9, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
October 30, 2024
September 5, 2024