Amazon Athena - Serverless Query Analysis
Abi Meena K
Posted on March 30, 2024
Amazon Athena is a serverless interactive query service. It is mainly used for analysing data. The main advantage of using Athena is that it can quickly perform complex queries.
It takes input from s3 buckets and no other third party sources or other services. Athena is flexible when it comes to input data format like CSV, JSON, ORC ,etc..,. It follows Pay Per Query Model. It means the user only pays for the number of queries they execute.
How It Works?
Athena Directly accesses data stored in the s3 buckets. The data stored in the bucket can be structured, unstructured or semi-structured data. And it allows cross account access to s3 buckets. This data is processed through AWS Glue Crawler to automatically identifies the schema and structure of the data. The data is now transferred to AWS Glue Data Catalog. It acts as the central repository to store the information related to the data.
Note: Both the crawler components are only concerned about the metadata not the actual data itself.
Now Athena restructures the data into the table schema obtained from AWS Glue Data Catalog. Now the data is to be queried and analysed. Now the results are either generated as reports or integrated with other tools like Amazon Quicksight .With the help of AWS Key management service, Athena can query encrypted data and provide encrypted results.
Hands On:
Step 1: Determine the dataset you wish to analyze and store it in an s3 bucket.
Step 2: Create another s3 bucket. This bucket acts as the destination bucket that stores the results of the queries.
Step 3: Open Amazon Athena. Go to Settings. You will see Query Results and Encryption Settings. Click on the Manage button. Now in the pop-up specify the destination bucket created.
Step 4: Open Glue Crawler. Create a new Crawler.
Enter the name of the crawler
Click on Data Stores in in Source Type
Now add the s3 bucket path in the path field
Click No in Multiple Data Stores section
Create an IAM role for this service
Select Run On Demand in Frequency Section
Click on Add Database and Create a Database.(This Database is where the data will be stored in tables)
Review the Choices and Click on Add Crawler.
Click on Finish
Step 5: Now Open Athena and Enter the Queries. Click on the Run Button. Output will be displayed below the queries.
Step 6: Now to integrate with QuickSight, Open Quicksight and create an account.
Step 7: Create new Visualization. Give the input source as Amazon Athena.
Step 8: Select the type of representation you wish to see. The Output will be displayed in the Sheets Tab.
Conclusion:
To sum up, my project successfully implemented an Athena service integrated with Quicksight, utilizing S3 buckets for input and destination. This straightforward setup highlights the practicality and effectiveness of leveraging AWS services for efficient data handling.
Posted on March 30, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.