Low Cost "Overkill" AWS Infrastructure for a Newborn Startup
Nicolas El Khoury
Posted on March 28, 2023
Introduction
On a cold and dark evening in December 2022, a good friend of mine calls me and says: "Nicolas, I am creating a product that is going to scale massively and revolutionize the market, and I need your help". Now, if I had a dollar for every time I heard this sentence, I would be financing trips to Mars by now.
Nevertheless, I met with the friend and his technical lead. After long hours of discussions (and daydreaming), the business model was summarized as follows: "The product is a maintenance management platform designed to help companies and vehicle owners to efficiently manage their vehicles. The product aims to automate the entire maintenance procedure and provide preventive and predictive solutions by connecting vehicles to IoT devices, which allows the monitoring of maintenance parameters in real-time."
I agreed to help them for many reasons, some of which include:
- They actually know what they are doing.
- The technical lead is absolutely intelligent.
- I trust they will make it.
My job, evidently, was to architect and implement the infrastructure, deployment, and maintenance of the application.
Requirements and Challenges
At the time of discussion, they had just finished an MVP that was deployed poorly on AWS. In fact, both my friend and the technical lead have very minimal experience in everything related to infrastructure and DevOps. In addition, they had little money to pay my original fees and therefore did not want to be a big burden on me. So at first, they suggested that I perform a very basic infrastructure and deployment strategy, that they can use temporarily until they raise more money.
The first thought I had was: "Those noobs don't even know what they are talking about". From my experience in consulting with more than two dozen companies (from small startups to extremely large multinationals), once you start working with a bad infrastructure, chances are you will keep building on top of it until working on it becomes a living hell, and then possibly run out of business due to bad tech. I was definitely not going to be part of this scenario
Therefore, my answer was: "No, I will do it properly". So after countless back-and-forth discussions, below is the summary of the challenges to think about while architecting the solution:
- There must be at least two environments: Develop and Production.
- The developers must be able to operate the infrastructure without having to become DevOps Engineers.
- Proper observability must be employed to quickly identify and solve issues when they happen (Because they will happen).
- The cost must be as optimized as possible.
- And finally, I set a requirement, for my sake primarily: The solution must be robust enough to minimize the number of headaches I have to suffer from in the future.
Understanding the Application
Before actually coming up with the solution, a good approach would be to first understand the different components of the application. Therefore, as a first step, the technical lead was kind enough to explain to me the different components of the application, and how to run it locally.
For simplicity purposes, both the backend (NodeJS) and frontend (ReactJS) applications are designed as a mono repository, managed through NX. The application stores its data in a PostgreSQL database. Surprisingly, the application was very well documented, a phenomenon I have rarely seen in my life. Therefore, understanding the behavior and the build steps of the application wasn't so difficult.
In about three hours, I was able to containerize, deploy, and run all the containerized application components on a single Linux machine. Amazing! First step complete.
Infrastructure Requirements
Now that the application is containerized, and all the steps documented, it is time to architect the infrastructure. Whenever I am architecting a solution, regardless of its complexity and cost, I always make sure to achieve the following characteristics:
Security: One of the most integral parts in any application is security. A robust software is one that prohibits cyber attacks, such as SQL Injection Attacks, Password Attacks, Cross Site Scripting Attacks, etc. Integrating security mechanisms in the code is a mandatory practice to ensure the safety of the system in general, especially the data layer.
Availability: Refers to the probability that a system is running as required, when required, during the time it is supposed to be running. A good practice to achieve availability would be to replicate the system and application as much as possible (e.g., containers, machines, databases, etc).
Scalability: The on-demand provisioning of resources offered by the cloud allows its users to quickly scale-in and scale-out resources based on the varying load. This is absolutely important, especially to optimize the cost, all while serving the traffic consistently.
-
System Observability: One of the most important mechanisms required to achieve a robust application is system visibility:
- Logging: Aggregating the application logs and displaying them in an organized fashion allows the developers to test, debug, and enhance the application.
- Tracing: Tracing the requests is another important practice, allowing to tail every request flowing in and out of the system and rapidly finding and fixing errors and bottlenecks.
- Monitoring: It is essential to have accurate and reliable monitoring mechanisms in every aspect of the system. Key metrics that must be monitored include but are not limited to CPU utilization, Memory Utilization, Disk Read/Write Operations, Disk space, etc.
Infrastructure Solution
In light of all the above, and after twisting my imagination for a little bit, I came up with the architecture depicted in the diagram below (Does not display all the components used):
Networking
The infrastructure is created in the region of Ireland (eu-west-1). The following network components are created:
- Virtual Private Cluster: To isolate the resources in a private network.
- Internet Gateway: To provide internet connectivity to the resources in the public subnets.
- NAT Gateway: To provide outbound connectivity to private resources.
- Public Subnets: In each availability zone.
- Private Subnets: In each availability zone.
VPN
A VPN instance with a free license is deployed to provide secure connectivity for the developers and system administrators to the private resources in the VPC.
AWS EKS
An AWS EKS cluster is created to orchestrate the backend service of each environment. The cluster is composed of one node pool made of 2 nodes, each in an Availability zone.
Application Load Balancer
An Application Load Balancer (Layer 7) is created to expose the endpoints and provide the routing rules required from the internet into the application. The load balancer is configured to serve traffic on ports 80 and 443.
AWS RDS PostgreSQL
An AWS RDS PostgreSQL database is created to hold and persist the application’s data. Both the develop and production environments are hosted on the same instance but are separated logically.
Clients VM
A private virtual machine on which client applications are installed, to interact with different parts of the infrastructure (e.g., kubectl, PostgreSQL client, etc).
AWS ECR
Two ECR repositories are created for the backend service, one for each environment.
S3 Bucket
An AWS S3 bucket is created to host the frontend application for each environment.
AWS Cloudfront
An AWS Cloudfront distribution is created to cache the frontend application hosted on AWS S3 of each environment.
ACM
ACM Public certificates are required for the domains. A public certificate must be created in the region of eu-west-1 to be used by the load balancer, and another one in the region of us-east-1, to be used by Cloudfront.
Cloudwatch
The infrastructure metrics and application logs are configured to be displayed on Cloudwatch.
Application Deployment
Now that the infrastructure was successfully architected and created, I proceeded to deploy the containerized backend services and ensured their proper connectivity to the databases. Afterward, the frontend application was built and deployed on S3.
Continuous Delivery Pipelines
The last step before signaling to the team the good news was to automate the build and delivery steps of all the services. Evidently, none of the developers should perform tedious and time-wasting tasks of building and deploying the application everytime there is a change. As a matter of fact, knowing the pace at which the developers are working, I expect they push code to develop 276 million times per day.
Therefore, I used AWS Codebuild and AWS CodePipeline to automate the steps of building and deploying the services. The diagram below depicts all the steps required to continuously deliver the frontend and backend applications:
Conclusion
Once everything is done, I met with the friend and with the technical lead for a handover. They were so pleased with the outcome, stating that the infrastructure is amazing, but is overkill and much more than they need right now.
But in reality, it is not an overkill. As a matter of fact, the product and the team are growing very rapidly. This solution is a skeleton that can be quickly and easily modified and scaled upon need:
- Backend services replicas can be easily modified.
- The EKS nodes can be easily scaled vertically and horizontally.
- The frontend application is on S3, which is automatically scalable.
- The database can be easily scaled vertically and horizontally.
After delivering the solution in mid December 2022:
- The developers are happy because of the robustness and ease of use of the infrastructure.
- My friend is happy because his application is live, and is costing him less than $500 per month.
- I am happy because they never called me with a complaint.
Everybody is happy :)))) The end!!
Posted on March 28, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.