How I spent 10 months identifying & reducing technical debt?

tehmas

Asad Raheem

Posted on July 16, 2020

How I spent 10 months identifying & reducing technical debt?

Edit: Oh, I forgot to mention. Following these simple approaches, I have been able to improve performance by up to 70% and increased scalability to a whole new level by utilizing event-driven architecture.

It's in October 2019. Finally, I have the time to start lowering the technical debt ratio. But where to start? How to identify? How to prioritize? Which tenant's experience to focus on?

Identification

The ultimate goal is to improve user experience by improving performance and scalability. It makes sense to prioritize features that are being used the most by the customers. One approach would be to conduct surveys but that doesn't seem practical. Where can I get such data? Telemetry (Application Insights).

Since my Cloud Computing Platform (Azure) had a cap for querying data using its portal, I had to write a Python script for downloading and processing 3 months of data for tenants individually and collectively. I obtained the 50th, 90th, and 99th percentiles and the average for each service's response time. After filtering the data using the obtained stats, services were prioritized and selected for improvement accordingly.

Analyzing & Improving

I analyzed each selected and related services. Re-designing and revamping each service entirely was not a practical idea. Therefore, I proceeded with the following approaches:

Reducing DB Round Trips

A database round trip is very costly. I made sure minimum roundtrips were being made by either fetching all required data in a single round-trip or using Redis Cache.

Optimizing Queries

The ORM I was using generates optimized queries but in some cases, the query required a full table scan or was fetching too much data. I introduced non-clustered indexes where reading occurred more frequently than writing. I also split the queries accordingly.

Stored Procedures

Sometimes, it's not possible to reduce database roundtrips when several tables need to be accessed or operated on. Although from my perspective stored procedures increase maintainability, I used them in such situations. Slow performance services were now high-performance ones. I also changed the parameters of existing stored procedures. Instead of calling the stored procedure repeatedly for different Ids, I passed all the Ids in one go in a comma-separated string.

Sync to Async

Communicating synchronously with external resources blocks the thread. Wherever possible, I transitioned to asynchronous APIs.

Bulk Operations

I utilized a third-party library to bulk insert/update large number of records.

Breaking Services

Some services were returning too much unrelated data. This was halting the front-end application to render its components even though not much information was required to be shown on them. I broke such services and tried to use the parts of the original response model to have less impact on the front-end application.

Concurrency Issues

It turns out customers sometimes use a feature in unexpected conditions. I needed to resolve concurrency issues without affecting scalability and performance especially for power-user features. I utilized the serverless compute service (Function App) to cater to this.

Event-Triggered Tasks

A user shouldn't necessarily need to wait for the entire service to complete. Sometimes the job is complex and unavoidably takes time due to several stages involved. I broke such services into several phases and utilized serverless compute service (Function App) along with real-time messages (Azure SignalR) so that the user doesn't need to wait on the same web page. This also reduced load on the server and provided better scalability.

Transition to New Technologies

The ORM I was using seemed to be significantly slower than a newer one. I also had to utilize new features such as dependency injection in Function Apps. Wherever easily possible, I transitioned to new technologies for better performance.

Domain-Driven Design

Initializing a huge data model context takes time. This is even more problematic on the consumption plan due to the cold-start. I started to divide the context domain wise so that only a small set of related tables are in it. This started a transition to microservices approach from a monolithic one but this would require incremental steps to be rolled over the entire product. Nonetheless, I now have a framework set up for doing so.

I was also involved in developing new features in the meantime. It was fun and I'm still reducing technical-debt whenever I get a chance. This experience has improved my API designing and coding skills.

I hope this helps.

💖 💪 🙅 🚩
tehmas
Asad Raheem

Posted on July 16, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related