Scaling to 125 Million Transactions per Day: Juspay's Engineering Principles
Gorakhnath Yadav
Posted on June 19, 2024
At Juspay, we process 125 million transactions per day, with peak traffic reaching 5,000 transactions per second, all while maintaining 99.99% uptime. Handling such enormous volumes demands a robust, reliable, and scalable system. In this post, we'll walk you through our core engineering principles and how they've shaped our engineering decisions and systems.
When designing systems at this scale, several challenges naturally arise:
- Reliability vs Scale: Generally, as you scale, you tend to exhaust resources, which can affect system availability.
- Reliability vs Agility: Frequent releases and system changes can impact system reliability.
- Scale vs Cost-Effectiveness: Scaling requires more resources, leading to higher costs.
Core Engineering Pillars
We've been able to strike the right balance between these challenges by anchoring our tech stack on four pillars:
- Build zero downtime stacks: Solve reliability by building redundancy at each layer to achieve almost 100% uptime.
- Horizontally scalable systems: Solve scalability by building systems that can scale horizontally by removing bottlenecks.
- Build agile systems for frequent bug-free releases.
- Build performant systems for low latency, high throughput transaction processing.
Adopting Haskell Programming Language
To achieve our goals, we've made a critical investment: adopting the Haskell programming language. Haskell, a functional programming language, offers performance akin to C, which is closer to the machine and processes transactions much faster. With Haskell, we've reduced our transaction processing time to less than 100 milliseconds.
Here's an example of a Haskell function that adds two numbers:
-- Define the add function
add :: Int -> Int -> Int
add x y = x + y
-- Main function to test the add function
main :: IO ()
main = do
let sum = add 3 5
putStrLn ("The sum of 3 and 5 is " ++ show sum)
This function showcases Haskell's concise and readable syntax.
Additionally, Haskell's readability, like English, enables non-technical folks to read the code easily, verify business logic, and sign off on features during development itself. As a strong-typed language, Haskell enforces a set of rules to ensure consistency of results, helping us preempt failures and achieve zero technical declines.
Cache-based Shock Absorber
To handle scale and remove database bottlenecks, we introduced a horizontally scalable caching layer where real-time transactions are served from this cache layer and later drained to the database.
Scaling up and down the cache layer is relatively easy and cost-effective compared to scaling databases.
Rapid Deployment and Release Frameworks
With rapid development comes the challenge of frequent production releases. To achieve agility through frequent releases without compromising reliability, we've built internal tools for automated releases with minimal manual effort. These tools monitor the performance of the release by benchmarking error codes against the previous stable version of the codebase:
- ART (Automated Regression Tester): A system that records production payloads and runs them in the UAT system against a new deployment to identify bugs early.
- Autopilot: A tool that creates a new deployment and performs traffic staggering from 1% onwards.
- A/B testing framework: A system that monitors and benchmarks the new deployment's performance against the previous stable version. Based on this benchmark, the system automatically decides to scale up the traffic or abort the deployment.
Hyperswitch: An Open-Source Payments Switch
We're carrying these learnings forward to our latest product, Hyperswitch, an open-source payments switch. Every line of code powering our stack is available for you to see. With Hyperswitch, our vision is to ensure every business has access to world-class payment infrastructure.
Conclusion
Through these investments, we've built reliable, agile, and scalable systems, enabling our engineers to solve exciting new problems and fostering a culture of systems thinking within the company. We encourage developers to engage with the open-source Hyperswitch project and explore the principles and technologies we've adopted to handle massive scale and high-volume transaction processing.
Posted on June 19, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
June 19, 2024