Choosing the right AWS Database
Gaurav Raje
Posted on January 17, 2024
In this article, I will guide you through the best-suited AWS database for various use cases, discussing the advantages and disadvantages of each type. We will explore the three main categories of database systems available on AWS and delve into their specific use cases, highlighting why certain databases excel in particular scenarios. Additionally, I will share insights on how to choose the most appropriate database for your needs.
Before going further, I want to talk about the application that I will use for the purpose of this blog post.
`I will use the example of a typical e-commerce website for this write-up.
Suppose you are running an e-commerce website that shows various items in a search result. Customers buy products from the site and pay on the site. Think something like Amazon.com. I will discuss which database is best used for which application part.`
First, let's look at the three database categories:
NoSQL Databases: Ideal for key-value lookups or queries using specific indexes, NoSQL databases, like AWS's DynamoDB, offer consistent performance for limited lookup methods as long as the parameters are indexed.
Relational Databases: These are versatile databases designed for ad-hoc queries. They have been widely used for various use cases and are beneficial for querying data in multiple ways. While they primarily rely on indexed columns, they can occasionally handle queries using non-indexed columns.
Big Data Databases: Best for analytics with large datasets, these databases facilitate parallel processing of queries across clusters, making them suitable for instances where the data size exceeds the capacity of a single instance, particularly with complex queries.
Now, let's examine an example:
DynamoDB: A popular NoSQL database on AWS, DynamoDB is serverless and has evolved from its original design based on the DynamoDB paper released by AWS. Primarily a key-value lookup database, DynamoDB is excellent for data that consistently needs key-based lookup. In DynamoDB, the primary key, a single column or a combination of a partition key and a sort key, uniquely identifies each row. This scalable database ensures rapid response times for each lookup. While there are limits on partition sizes and numbers, DynamoDB doesn't impose explicit limits on table sizes.
DynamoDB excels in scenarios where a database is integral to an application, particularly when dealing with domain-driven designs. For example, in an e-commerce application, each customer and product can have unique IDs. These IDs facilitate quick lookups of related details like names, birth dates, prices, and ratings, making DynamoDB an effective choice for such use cases. However, DynamoDB's efficiency diminishes when queries involve attributes beyond the indexed columns, like product or customer IDs. This is a limitation for advanced searches, such as filtering products by price or rating.
Amazon Aurora:
On the other hand, Amazon Aurora, a relational database compatible with Postgres or MySQL, is more suited for ad-hoc queries that might not have been anticipated during application design. While slower in lookups compared to DynamoDB, Aurora's strength lies in its versatility and ability to handle complex queries. Their support for 64 TB of data makes them useful for large data sets. However, ensuring that the query results fit into memory is important, especially for large and complex datasets.
In the context of our e-commerce application, Aurora is ideal for advanced search functionalities or generating weekly summaries for users. Its long-standing presence in the industry, dating back to the 1970s, means a wealth of knowledge and a ready pool of skilled professionals familiar with relational database management systems (RDBMS). This makes it easier to hire talent to manage these systems.
In summary, while DynamoDB offers fast response times and is suitable for applications with straightforward lookups, its capabilities are limited for complex queries. Although slower in lookups, Amazon Aurora provides greater flexibility and better suits applications requiring diverse and unpredictable queries.
This may not be the best solution for scenarios requiring complex queries on very large datasets, such as generating detailed reports for investors on customer spending patterns during holiday seasons. This requirement involves sifting through massive amounts of data and executing large-scale operations like "group by," which might be too demanding for a single database instance.
Amazon Redshift is tailored for handling extremely large datasets, perfect for running analytics on terabytes of data. Imagine conducting intricate queries involving group-bys and aggregations to devise a marketing strategy. These queries are typically run asynchronously by marketing and analytics teams rather than during live application operations. They are executed occasionally, and there is a tolerance for waiting for results. Redshift excels in these situations.
Redshift's architecture enables partitioning your query across multiple clusters, allowing for efficient big data processing. Its columnar nature makes aggregation operations more straightforward. Research around this topic can provide further insights.
In an e-commerce application, strategic questions that necessitate processing the entire database, especially those involving complex functions like group by or aggregates, are best handled by Redshift. Its columnar database structure significantly aids in these processes.
Conclusion
In summary, the text discusses selecting the appropriate AWS database for different use cases, focusing on the advantages and limitations of three main categories: NoSQL Databases, Relational Databases, and Big Data Databases.
NoSQL Databases are highlighted for their efficiency in key-value lookups or queries using specific indexes, with AWS's DynamoDB as a prime example. DynamoDB is praised for its scalability and fast response times, particularly suited for data that requires consistent key-based lookups.
Relational Databases are noted for their flexibility in handling ad-hoc queries and broad application range. They are typically optimized for querying indexed columns but can occasionally handle non-indexed columns.
Big Data Databases are recommended for analytics involving large datasets. The text emphasizes Amazon Redshift for such scenarios, where complex queries involving operations like "group by" and aggregations are needed, especially for tasks like creating reports or analyzing market strategies. Redshift's capability to partition queries and run them across multiple clusters and its columnar structure make it highly effective for processing large-scale data.
What you choose should depend upon your use case in revenue projections.
Posted on January 17, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.