Amazon Athena - Serverless Query Analysis

abimeena

Abi Meena K

Posted on March 30, 2024

Amazon Athena - Serverless Query Analysis

Amazon Athena is a serverless interactive query service. It is mainly used for analysing data. The main advantage of using Athena is that it can quickly perform complex queries.

It takes input from s3 buckets and no other third party sources or other services. Athena is flexible when it comes to input data format like CSV, JSON, ORC ,etc..,. It follows Pay Per Query Model. It means the user only pays for the number of queries they execute.

How It Works?

Athena Directly accesses data stored in the s3 buckets. The data stored in the bucket can be structured, unstructured or semi-structured data. And it allows cross account access to s3 buckets. This data is processed through AWS Glue Crawler to automatically identifies the schema and structure of the data. The data is now transferred to AWS Glue Data Catalog. It acts as the central repository to store the information related to the data.

Note: Both the crawler components are only concerned about the metadata not the actual data itself.

Now Athena restructures the data into the table schema obtained from AWS Glue Data Catalog. Now the data is to be queried and analysed. Now the results are either generated as reports or integrated with other tools like Amazon Quicksight .With the help of AWS Key management service, Athena can query encrypted data and provide encrypted results.

Athena Integrated with Quick Sight

Hands On:

Step 1: Determine the dataset you wish to analyze and store it in an s3 bucket.

Step 2: Create another s3 bucket. This bucket acts as the destination bucket that stores the results of the queries.

Step 3: Open Amazon Athena. Go to Settings. You will see Query Results and Encryption Settings. Click on the Manage button. Now in the pop-up specify the destination bucket created.

Step 4: Open Glue Crawler. Create a new Crawler.

  • Enter the name of the crawler

  • Click on Data Stores in in Source Type

  • Now add the s3 bucket path in the path field

  • Click No in Multiple Data Stores section

  • Create an IAM role for this service

  • Select Run On Demand in Frequency Section

  • Click on Add Database and Create a Database.(This Database is where the data will be stored in tables)

  • Review the Choices and Click on Add Crawler.

  • Click on Finish

Step 5: Now Open Athena and Enter the Queries. Click on the Run Button. Output will be displayed below the queries.

Sample Query

Step 6: Now to integrate with QuickSight, Open Quicksight and create an account.

Step 7: Create new Visualization. Give the input source as Amazon Athena.

Step 8: Select the type of representation you wish to see. The Output will be displayed in the Sheets Tab.

Sample Visualization

Conclusion:

To sum up, my project successfully implemented an Athena service integrated with Quicksight, utilizing S3 buckets for input and destination. This straightforward setup highlights the practicality and effectiveness of leveraging AWS services for efficient data handling.

💖 💪 🙅 🚩
abimeena
Abi Meena K

Posted on March 30, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

What was your win this week?
weeklyretro What was your win this week?

November 29, 2024

Where GitOps Meets ClickOps
devops Where GitOps Meets ClickOps

November 29, 2024

How to Use KitOps with MLflow
beginners How to Use KitOps with MLflow

November 29, 2024