Spatial Search of Amazon S3 Express One Zone Data with Amazon Athena and Visualized It in QGIS

dayjournal

Yasunori Kirimoto

Posted on December 18, 2023

Spatial Search of Amazon S3 Express One Zone Data with Amazon Athena and Visualized It in QGIS

img
img

I tried a spatial search of Amazon S3 Express One Zone data with Amazon Athena and visualized it with QGIS 🎉

I previously posted an article verified with S3 Standard. This time, I verified it with the new S3 Express One Zone announced at re:Invent 2023, focusing on the results of integrating with Athena, spatial search, and improving search speed when using S3 Express One Zone!

S3 Express One Zone is an Amazon S3 storage option focusing on high performance. This option is also available in the Tokyo Region and is designed to provide up to 10 times better performance than the S3 Standard storage class. In addition, the request fee is 50% less than S3 Standard. To use this service, a specific bucket type called "Directory Bucket" is used.

Advance Preparation

Prepare GIS data for use with Amazon Athena. This time, we created four types of sample data in QGIS in advance.

I prepared GIS data for points, lines, and polygons in CSV (TSV format).
img

I prepared an additional 1 million points of GIS data in CSV (TSV format).
img

I have registered this sample data on GitHub, so please feel free to use it.
https://github.com/dayjournal/data/tree/main/try-106

Bucket Creation & Data Registration (S3 Express One Zone)

Create buckets and register data with Amazon S3 Express One Zone.

Click AWS Management Console → S3.
img

Click "Create Bucket".
img

Set Region Bucket Type as Directory, Availability Zone, and Base Name.
img

A bucket with the specified name is created.
img

Select the target bucket → Click "Upload."
img

Select the file you want to register → Click "Upload.
img

Check the uploaded files.
img

The four types of CSV (TSV format) were saved in a directory bucket with an arbitrary name.
img

This completes the data registration for S3 Express One Zone!

Set the query destination

This is how to set the query destination in Amazon Athena.

Prepare an S3 bucket with an arbitrary name for the query destination in advance.
img

Click AWS Management Console → Athena.
img

Click on "Check query editor for details."
img

Click on "View Settings."
img

Click "Manage."
img

Specify the S3 bucket where you want to save the query → Click "Save."
img

The query destination is set.
img

The setting of the query destination is now complete!

Table Creation

This is how to create a table in Amazon Athena.

Click the Athena editor → Create Table and View → "S3 Bucket Data."
img

Set table name, database selection, target S3 bucket specification, data format, and column settings. Check the preview → Click "Create Table."

S3 Express One Zone buckets are currently not displayed in the list, so you must enter the address directly. The address should be prefixed with "s3://".
img

This time we created four arbitrary tables. Target table → Click "Preview Table."
img

The retrieved records are displayed.
img

Now your table creation is complete! We have confirmed that the table can be read by Athena with no problem in S3 Express One Zone.

Spatial Search

Finally, here is how to do a spatial search in Amazon Athena.

Let's get the center of gravity point from a polygon. Download the result data.

S3 Standard
Time in queue: 0.243 sec, Run time: 0.799 sec, Data scanned: 1.5KB

S3 Express One Zone
Time in queue: 0.120 sec, Run time: 0.899 sec, Data scanned: 1.5KB

SELECT "geospatial_database"."polygon_table"."name", ST_Centroid(ST_GeometryFromText("geospatial_database"."polygon_table"."wkt")) FROM "geospatial_database"."polygon_table";
Enter fullscreen mode Exit fullscreen mode

img

Visualize the downloaded data in QGIS to confirm the processed data.
img

Try to get the starting point from the line. Download the result data.

S3 Standard
Time in queue: 0.175 sec, Run time: 0.601 sec, Data scanned: 1.05KB

S3 Express One Zone
Time in queue: 0.119 sec, Run time: 0.948 sec, Data scanned: 1.05KB

SELECT "geospatial_database"."line_table"."name", ST_StartPoint(ST_GeometryFromText("geospatial_database"."line_table"."wkt")) FROM "geospatial_database"."line_table";
Enter fullscreen mode Exit fullscreen mode

img

Visualize the downloaded data in QGIS to confirm the processed data.
img

Try to get only the points included in the polygon. Download the result data.

S3 Standard
Time in queue: 0.313 sec, Run time: 1.230 sec, Data scanned: 2.01KB

S3 Express One Zone
Time in queue: 0.073 sec, Run time: 0.993 sec, Data scanned: 2.01KB

SELECT "geospatial_database"."point_table"."name", "geospatial_database"."point_table"."wkt" FROM "geospatial_database"."point_table", "geospatial_database"."polygon_table" WHERE ST_Within(ST_GeometryFromText("geospatial_database"."point_table"."wkt"), ST_GeometryFromText("geospatial_database"."polygon_table"."wkt"));
Enter fullscreen mode Exit fullscreen mode

img

Visualize the downloaded data in QGIS to confirm the processed data.
img

Try to get only the points included in 1 million polygons. The response time is fast even when searching a large amount of GIS data. Download the result data.

S3 Standard
Time in queue: 0.220 sec, Run time: 2.832 sec, Data scanned: 46.41MB

S3 Express One Zone
Time in queue: 0.117 sec, Run time: 2.843 sec, Data scanned: 46.41MB

SELECT "geospatial_database"."randompoint_table"."name", "geospatial_database"."randompoint_table"."wkt" FROM "geospatial_database"."randompoint_table", "geospatial_database"."polygon_table" WHERE ST_Within(ST_GeometryFromText("geospatial_database"."randompoint_table"."wkt"), ST_GeometryFromText("geospatial_database"."polygon_table"."wkt"));
Enter fullscreen mode Exit fullscreen mode

img

Visualize the downloaded data in QGIS to confirm the processed data.
img
img

By using Amazon Athena, a spatial search of data registered in S3 becomes possible!

This verification confirmed that even when using S3 Express One Zone, it is possible to link with Athena and realize spatial search.

As for spatial search, we saw a performance improvement of more than 120% for some searches, but no significant overall speed improvement was observed. This may be because S3 Express One Zone specializes in processing many small files and may not be suitable for the large spatial search data used in this verification.

However, in terms of storage cost reduction, we saw significant advantages in using S3 Express One Zone!

Related Articles

References
Amazon Athena
Amazon S3
QGIS

💖 💪 🙅 🚩
dayjournal
Yasunori Kirimoto

Posted on December 18, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related