MicroServices — (not) a Fix All Solution

solegaonkar

Vikas Solegaonkar

Posted on May 20, 2023

MicroServices — (not) a Fix All Solution

A recent blog post from Amazon Prime Video has caused a lot of ripples on various platforms and forums. It was a shock and surprise for all of us when Amazon — one of the early advocates of MicroService and Serverless architecture, started talking against it!

I am sure this would have stalled several modernisation projects. Perhaps some “early adopters” have started working on a “back to monolith” drive!

I am sure you are not one of those :)

Time to Introspect

However, I do feel this is an occasion to pause and introspect. Are we following an architectural paradigm because it is the trend? Or have we analysed it enough to find out what is right for our business case?

That leads us to a bigger question — what are the parameters for this analysis? How do we identify the right paradigm for this business case?

Foremost, we should understand that there is no fix-all solution in life. Monolithic architectures have several benefits that we lose out on transitioning to MicroServices — which has its own benefits. Either options has its own pros and cons. The question is, which ones are more relevant to our business problem?

The Prime Story

Let us first look at what happened at Prime Video. Foremost, understand that they have not dumped MicroServices or even Serverless. However, they noticed that one particular service was not optimal on the Lambda Functions orchestrated by Step Functions. Hence, they migrated it to EC2.

Does this mean that any service will work better on EC2? Are they planning to move all their services to EC2? Certainly not! There was something special about this service, that makes it better on EC2. Let’s check this in detail.

As per their blog, this service is responsible for Video Quality Analysis. It monitors the contents of every stream of video being played. It had three main components: A media convertor that converts the streamed video/audio to stream of data being sent to the defect detectors. The defect detectors have ML algorithms that monitor the data to identify any defects. And finally, they had a component that orchestrates this flow.

The process used S3 buckets as an intermediate storage for the video frames! After reading this description, I was surprised what took them so long to move it out of Serverless! Let us look at the obvious problems in this workload.

Continuous v/s Event Driven

It is a continuous process, not event driven. It is working “most of the time”, without rest — not on specific events. Lambda functions work best when the workload fluctuates. We all know that. The question is how uniform is uniform and how high is high load?

Let’s look at some numbers to understand this better. Compute in a Lambda function is roughly five times costlier than a server. The billing granularity is milliseconds. Thus, we are better off with a Lambda function if the system is idle for 4/5th of the time.

If our function runs in 10ms, and we have a uniform, consistent load of exactly one request per 50ms (72K requests per hour), we are still better off with Lambda functions. Of course, with non-uniform usage, Lambda functions can compete with servers on a much higher load. A lot of systems, by nature require only a couple of invocations per hour per user. They have no reason to use a server.

Information v/s Data

Ideally, a MicroService should consume data and pass on information. However, these services exchange the full chunk of data, extracting information for themselves.

Naturally, they have a huge data transfer overhead. In a distributed system, this means data flowing crazy between availability zones, or even regions. This leads to huge wastage. A monolith kills this overhead and improves the performance as well as cost.

This was not a problem with MicroService paradigm or Lambda functions. It was a problem with the way the workload was split into services — that needed huge amount of data passing between them. The key to success of MicroService architecture lies in splitting the process correctly.

ML Algorithms

The above services used to run ML algorithms — that are naturally processor intensive. Lambda functions are useful only if they are short and sweet. If you have a compute intensive, ML workload, better go for servers with GPU hardware that can do a much better job for you.

Step Function

They have additional overhead of step functions that oscillate between states for each stream for each user. That is a huge cost overhead. Step functions are elegant way to manage states if we identify the states judiciously. For everything else, a simple if/else block is always more efficient.

S3 Buckets for transactional data

S3 buckets are the worst option for transient data storage. The read/write cost on S3 is way too high compared to the storage costs. If Lambda functions orchestrated by Step Functions exchanged data in S3 buckets, it is obvious that they got such a high cost and overhead.

When (not) to use MicroServices

It is easy to criticise the past mistakes. The question is, how do we use these learnings in our products? When should we choose a Monolith or a MicroService?
I think, MicroService is a thought, more than any syntax or toolkit. That is universal, and I cannot think of a scenario when it would fail. The only question we need to answer is, whether we deploy individual MicroServices as docker containers or lambda functions, or as a group of modules compiled together into a monolith?
This can be decided based on several factors like the kind of data passing between them, the need for independent scaling, resilience, etc. The only way to succeed at this game is to objectively evaluate your requirement — not just following a trend.

💖 💪 🙅 🚩
solegaonkar
Vikas Solegaonkar

Posted on May 20, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related