How to approach performance issues
Malek
Posted on January 27, 2023
Hello there , let me guess : you are suffering tasks/endpoints calls that are taking multiple seconds, but your users are in rush and they couldn't wait that long. so you ought to tackle those latency concerns and you need some recommendations from people who already crossed that path ( and reduced their API calls latency by 80% may be !) . You've come to the right address !
So the process toward an efficient performance improvement plan includes :
1-Identifying the slowest endpoints :
Usually we start raising concerns about performance when the user is struggling to get some data to show up in a timely manner, so obviously , the main trigger to such investigation is the user experience (Let's hope that the user in your case is the QA and not the actual customer ).
And since the slowness reported by the user of those specific endpoints could be not related to you application ( could be related to his internet connection bandwidth or his hardware ...), which brings us to the next step !
2-Make use of monitoring tools :
They are very useful in terms of having a general overview of how is your API behaving through the usual day to day usage of your API. There is multiple great tools on the market for this purpose , but if your API is deployed in an Azure environment , you should consider Azure Insights and Azure monitoring services , the latter was very useful on grouping / Filtering all API calls based on their timing ( which is what we need in this case) , and a bunch of other parameters. It's an awesome service that I consider the source of wisdom , as it helps find out the slowest endpoints along with many other details that are useful for your investigation( duration of the endpoint call , count of calls...).
If Azure is not a viable option for your case , then you should consider other external tools options , I had an experience with Prefix tool , A great monitoring solution , especially for performance related investigations : its options include but not limited to :
- Detecting all API calls throughout your navigation journey
- Filtering the registered API calls
- API call details : Timing, duration , URL, HTTP response type number of queries related to that API call , timing of each query, SQL queries themselves which is super useful in case you are not very familiar with MS SQL profiler ( even more suitable in case of using an ORM such as EF ) The good news is that all of those options are included in the free version of Prefix, its premium version provides much more advanced options.
3-Reproducing the slowness via benchmarking:
When you get into load testing subject , you will get familiar with warmup , outliers, debug/release environments.
Mainly, the point of this step is to reproduce the slowness we are experiencing in the prod environment , but when you try to do that , it is not easily reproducible , so when you hit a specific endpoint, you find its response timing is few milliseconds, but wait a moment, wasn't this endpoint tagged by Azure monitoring or Prefix as slow based on the statistics! yes it is , the one and only , so how could I improve /test an endpoint that is already performant ( well, here comes the famous saying , "it works on my machine" if you know what I mean ;)
Similar to any other bug that you try to fix ( and since you are here , I guess you haven't deal with a few , they are normally your bread and butter ) , so the first step toward finding a solution is to reproduce the problem "On your machine". for that you need to simulate the environment in which this slowness happened and to do that , you will have to :
Prepare a database with similar quantity of data to the one in the prod ( or the environment having the slowness issue), simply because you will not be able to reproduce the issue if you are having 15 items in each of your db tables , but the prod tables , each is having thousands of rows. For this process you may consider preparing a script to load the database with a bulk data, and if you are still in the .Net environment , Bogus framework could be of use in this case
Simulate the application state : means selecting the release option in case of .Net apps , doing an application warmup. Here we go another performance related term !
Warmup is the step of setting the application on the state READY , in order to get accurate measure of the api calls timing. doesn't an API call have the same behavior throughout the whole application lifetime ? the answer is no , at least not exactly , especially when you get into the details .
First time you run the application and start an API call , there is a load of code to be compiled , dependencies to be loaded and injected , caching to be done , Queries and linq expressions to be compiled , monkeypatched methods /classes that are to be defined/loaded on the fly with reflection or Harmony . All those operations do not happen on every request , they happen mainly at the first 10-100 API calls , and because of that , performance measurment taken on the first calls are not as accurate as the ones done later on the process , and their environment/app state is not similar to the environment/app state in which the user encountered slowness. Therefore , you ought to iterate a couple hundreds of calls to the API just before hitting it with calls for performance measurments/ slowness reproduction.The slowness reported from the monitoring tools is based on thousands of calls done to a specific endpoint , what's showed up as the call duration is the average of the thousands of calls' durations. so you may consider launching thousands of call ( not manually of course !). For this , you will need a tool to launch such kind of scenarios, again , if you are on the .Net environment, the .Net benchmarking tool BenchmarkDotNet could be of interest.
What's the point of this tedious setup ?
Wise question, you here to find some solutions to the performance issues that you are facing , how does this complicated setup , shouldn't be there some possible solution for you to start the improvement plan ?
Yes , I hear you, there is a myriad of solutions that you could implement for your performance improvement plan, and we will get there in the future articles. However, those improvements plans may not be as effective as you expected ,and you will need a solution to find out whether they are effective or not , and if effective , how effective are they , taking into consideration the complexity they introduce into your code base. Besides, setting up a testing environment is crucial for you to verify if there is any real improvement resulting from a specific implemented logic, because , it is not accurate to check/measure improvements just by trying things out with no defined statistics/result, that way , who knows if there was any real improvement, you know the saying " Data is religion, without data you are just another guy with an opinion".
Coming to the end of this article , if you still interested in performance improvement plan , keep an eye on for the next articles ! keep hacking
Posted on January 27, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.