System Design Case Study of an Online Booking System.
Srikanth
Posted on August 16, 2023
Have you ever experienced the rush of online shopping? I bet you felt an adrenaline rush as you clicked through the website, competing with thousands or even millions of people at the same time.
Top tech companies are behind some of these websites. They are capable of serving millions of users simultaneously. In this blog post I write about an online booking system that you may not have heard about. It is the system powering the booking process of the most visited Hindu Temple in the world, Tirumala.
Tirumala Tirupati Devasthanam is the independent trust which manages the aforementioned popular temple in the state of Andhra Pradesh, India.
My family visits this temple often. I saw the evolution of the digital systems empowering the temple through the years. I was planning to visit the temple last year, first time after the pandemic, and was stuck by the new online booking experience. It was near impossible to buy tickets. At first I attributed this to rapid digitalisation India has seen in the past few years. Although that does play a part, what I learned by analysing the website through a system design lens was fascinating.
My intention is to share my analysis and if possible start a discussion around promoting user centric design in Indian public digital services. I do not intend to disparage the current developers of the TTD booking system. I am only exercising my curiousity in this topic.
In this blog post I write:
- An overview of the TTD online booking system: Visualising the user journey through the booking process.
- How does it fare in practice: Brief look at how the system performs in real world and common user complaints.
- Hard challenges: What are the common challenges faced in designing highly scalable and concurrent systems?
- Analysing the weakness: I attempt to explain the critical pain points of the system borne out of its design.
- Ways to improve: I conclude by suggesting ways to enhance the user experience. I offer solutions which are both techincal and product design focussed.
How does the booking system work?
The bookings become LIVE at an announced time.
Bookings for various services, including Special Entry Darshan, are open at specific dates and times. This information is communicated through the TTD website, social media, TV, and YouTube channels.
Special Entry Darshan offers a chance for devotees to have a viewing or be in the presence of a God at a temple. These bookings are typically accessible in the last week of the current month for the following month, although the exact dates can vary. Sometimes, the bookings for the following two months are made available simultaneously.
It's important to stay attentive to updates regarding opening times. It's uncertain whether the TTD app provides push notifications for such announcements.
You must be prepared to grab your slot.
- People are required to be registered on the website with their email and phone number. One can login to the website using phone number and a one-time password sent via SMS.
- Let's say the bookings are going live at 10AM on Friday. We need to be logged-in on the app or website as close to 10AM as possible. We would get auto logged out in a short interval so timing is important. đź•°
It's time for some speedy mouse clicks!
- After you have logged in and you click on the Special Entry Darshan from the "Online Services" menu you will be shown a countdown timer. You are essentially in a virtual waiting room to be let in. It seems like the time can vary depending on the traffic.
- Now we are finally in. We see a calendar interface with, hopefully, lots of open slots to book. Now we need to select a date and time slot to book tickets.
- After we enter number of people who are going to attend, we will be redirected to a page that asks us fill in details of each attendee. There are 2 dropdowns and 3 free text fields for each attendee. We fill everything correctly and get directed to a payment page.
- There are several options to pay, we select one and complete the payment. Finally we get redirected to a success page where there is a booking reference number and details about how to download the booking confirmation.
So how does the system perform?
- We saw quite a few steps there and you can see that there is lot of work went into implementing the system.
- I observe some pain points here. Firstly, the wait time to enter the booking system is bit opaque and doesn't make the experience smooth. (I know why this wait time is being used, we will talk about it in the next section.) Secondly there are lots of information to fill before finalising payment.
- Both of the above slow down the user to complete the booking. I have personally experienced this and so have many others. By the time we are able to fill all the details and click continue, we are shown an error that "the slot was just booked by another pilgrim".
What are the challenges here?
- We know that building software that attracts traffic in millions is hard. It requires real technical skills to ensure the system can handle the peak time traffic.
- Let's see what are the technical challenges in some detail. The obvious one is to just being able to serve the peak time traffic. I am not aware of the internals but I can guess that there is a load balancer sitting in front of multiple servers or containers to manage load. Since the bookings go LIVE at a certain time, preemptively increasing the server capacity is good way to deal with this situation.
- In one of the press releases, TTD stated that after the launch of the new booking system they saw 2,40,000 tickets booked within an hour. Around 900,000 concurrent sessions were open at the peak time. And, a total of 10 million hits were recorded by the system on that day.
- Designing a booking system capable of serving ~1 million concurrent users, with 240,000 available tickets will lead to dealing issues like scalability, performance, and data integrity.
- Moreover, preventing multiple users from booking the same ticket concurrently presents intricate challenges. These involve mitigating race conditions, maintaining atomicity throughout multi-step booking processes, and handling the intricacies of reservation periods.
- Locking and concurrency control mechanism to avoid any data inconsistencies becomes crucial part of the architecture design. We would need a database with strong consistency and transaction support but also which is easily scalable. I can also imagine a scope for challenges like race conditions and redundant bookings.
- Striking the right balance between ensuring all the technical intricacies and providing a seamless user experience will be a complex task.
- This is also a product design challenge. Ultimately the goal of this system to facilitate problem-free booking. To overcome the technical challenges, there have been design decisions made here. The big one is the virtual queue. The intention is to hold people in queue so that the main system is not overwhelmed by load. But it brings a different set of challenges that we will see next.
What problems do I see in current design
- I think the biggest problem is users reaching one step before payment and being told that the ticket is already booked by someone. Why does that happen? When we select the online booking service we are first added to a virtual queue. The waiting time in the queue is proportional to the traffic.
- Let us consider that 900,000 users trying to grab 240,000 tickets at the same time. I don't have any internal information about the system design. But let's say 20,000 users are allowed in at a given time.
- Since the number of tickets is much greater than number of users, that means these 20,000 users are supposed to have 100% chance to book their tickets.
- In reality it is not the case. Users are trying to book slots -- at a specific date and time period. There is a real chance that X% of users may choose the same time slot, and if they are more than the available slots then they will encounter an error. Weekends and holidays are often booked first, which makes it more likely for an overlapping booking.
There is also a question of how to manage the virtual queue. Let us assume that N users gain access to the system every minute. A user, who is already in, needs to complete the process before any new users enter. The slower one takes, the more competition one faces.
I found a YouTube video which revealed a design flaw. Apparently the waiting time for virtual queue can be changed by refreshing the page. The system seems to be assigning wait times either randomly or based on variables which are highly volatile.
- With the above context, entering 4 fields of data per person manually before being able to confirm the booking causes a significant obstacle for users trying to book higher number of tickets at a time. This probably why many people have had bad experience with booking.
How would I improve this
The goal of this exercise is to minimise the number of errors faced by users. Specifically the errors that stop them from completing their booking. I want to make the system more fair and transparent. Right now it favors someone who is very computer savvy or someone who knows about the loopholes in this system.
Small tweaks: A technical solution.
- As described above, the process of entering pilgrim data manually slows the users down. Since this step comes after choosing the time slot, the chosen slots may be filled in the interim. If we tweak the system so that the data needs to be entered before choosing the time slot then it would give users more accurate picture of available places.
- Similarly letting the users to save information ahead of time and selecting them during booking process can also improve the experience.
- When users face the “the slot was just booked by another pilgrim” error, they can be given an option to choose an alternative time slot. They need not re-enter the queue for a chance to book a slot.
- In order to make the system more transparent, a status message can be displayed to the users. This can indicate approximately how many users are online and how many tickets are available.
- Lastly, I would monitor the number of errors caused by 2 or more users trying to book the same time slot. This will not only give us the current state of the system but will also become the metric to measure the success of any future changes.
Thinking from first principles: A system design solution.
- What if we want to rethink how the system is supposed to work. An online booking system that is designed as first-come-first-serve is probably not a fully fair system. An online booking system that is designed as first-come-first-serve, but doesn’t guarantee tickets on first-come basis is definitely not a fair or transparent system.
- For inspiration, I turned to sports. Wimbeldon, Lords Cricket Ground and Old Trafford Football stadium, all sell a portion of their tickets through a ballot process. Every year people are invited to apply for the ballot informing which days/matches they want to attend. Once the ballot closes, tickets are assigned randomly. Payment is processed only for those who are successfully assigned tickets.
- We can adopt this system for TTD online booking. Although we will need to tweak a couple of things. Due to Reserve Bank of India’s directive on automatic payments, charging user’s cards when they are successfully assigned tickets might not be possible. Instead users may be sent emails/texts to inform about their tickets and asked to complete the payment within 72 hours. If a payment is not received, the tickets can be released back to the available pool.
- Some constraints may also be placed on the number of bookings requested by users. Users who haven't recently visited can be given higher priority.
- This approach not only gives everyone an equal chance but also reduces the burden on the system. A ballot can stay open for many days, hence there is no need for a traffic spike to occur.
Takeaways
- The Tirumala Tirupati Devasthanam (TTD) online booking system for the most visited Hindu Temple, Tirumala, is a complex booking system designed to handle peak time traffic.
- The system has challenges related to scalability, performance, and data integrity, and is designed to distribute traffic across servers, optimize database operations, minimize latency, and handle potential security threats.
- The virtual queue is the big design decision that TTD made to handle peak time traffic. However, there are design flaws in the system that need to be addressed to make it more fair and transparent.
- There can be several approaches to overcome the challenges faced by the system. I have only discussed a few. It is always helpful to reframe the problem. This allows us to come up with solutions that are simplest and most effective.
Thanks for reading. If you liked this blog post, you can follow me on LinkedIn for more.
Posted on August 16, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.