Fetching and reading files from S3 using Go 🔥👀
Sean O'Connor
Posted on May 26, 2021
Trying to figure out how to do simple tasks using the AWS SDK for particular services can be difficult given that sometimes the AWS documentation is limited and gives you the bare minimum. Today I'll show you how to fetch and read particular files from S3 using Go. This tutorial collates many hours of research into what should be a simple problem.
Prerequisites include:
- Go installed / previous experience with Go.
- AWS-SDK set up / previous development with AWS-SDK.
Basic imports
import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"github.com/aws/aws-lambda-go/lambda"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
)
Defining global variables and structs.
Start off by defining some basic structs and global variables.
type S3Bucket struct {
Bucket string `json:"bucket"`
Key string `json:"key"`
}
type Metrics struct {
RMSE string `json:"rmse"`
MAE string `json:"mae"`
MAPE string `json:"mape"`
}
var pageNum int = 0
var s3Buckets []S3Bucket
var finalMetrics []Metrics
var sess *session.Session
Initiating a session.
Firstly we initialise a session that the SDK uses to load credentials from the shared credentials file ~/.aws/credentials, and create a new Amazon S3 service client.
sess, err := session.NewSession(&aws.Config{
Region: aws.String(conf.AWS_REGION),
})
if err != nil {
exitErrorf("Unable to create a new session %v", err)
}
Listing items in a bucket with pagination.
The AWS docs only give an example of accessing a bucket's files using ListObjectsV2
function. Now the problem I encountered with this function it does not allow us to apply our own custom function to the results in order for us to filter them even more. Another problem is it returns (up to 1,000) of the objects in a bucket with each request. This includes sub-paths to the files you wish to read.
ListObjectsV2
lists all objects in our S3 bucket tree, even objects that do not contain files. If I want to target certain objects we have to apply a function. So, instead we'll use ListObjectsV2Pages
. ListObjectsV2Pages
iterates over the pages of a ListObjectsV2
operation, calling the function with the response data for each page. To stop iterating, we return false
.
As shown below I wish to target only the .json
files in the page and append them to an s3Bucket slice
. This part is important as it will allow us to know the location of each file so we can then access the contents!
We pass our main bucket name as S3_BUCKET and our object path if there is one into S3_PREFIX.
svc := s3.New(sess)
err = svc.ListObjectsV2Pages(&s3.ListObjectsV2Input{Bucket: aws.String(S3_BUCKET), Prefix: aws.String(S3_PREFIX)},
func(page *s3.ListObjectsV2Output, lastPage bool) bool {
pageNum++
for _, item := range page.Contents {
if strings.Contains(*item.Key, "json") {
s3Buckets = append(s3Buckets, S3Bucket{Bucket: conf.S3_BUCKET, Key: *item.Key})
}
}
return pageNum < 100
})
if err != nil {
exitErrorf("Unable to list items in bucket %q, %v", conf.S3_BUCKET, err)
}
Accessing the object contents.
Using the s3buckets
slice, we will access the Bucket
and Key
from the struct
and request the 'Object' information (or in other words the file) and then fetch the object based on the object information.
for _, item := range s3Buckets {
requestInput := &s3.GetObjectInput{
Bucket: aws.String(item.Bucket),
Key: aws.String(item.Key),
}
result, err := svc.GetObject(requestInput)
if err != nil {
log.Print(err)
}
Reading the contents into slice
The JSON file 'result' is read with the ioutil.Readall()
function, which returns a byte slice that is decoded into the Metrics struct instance using the json.Unmarshal()
function.
The best tutorial I have found regarding reading JSON into a struct
is this one: Parsing JSON
defer result.Body.Close()
body, err := ioutil.ReadAll(result.Body)
if err != nil {
log.Print(err)
}
bodyString := fmt.Sprintf("%s", body)
var metrics Metrics
err = json.Unmarshal([]byte(bodyString), &metrics)
if err != nil {
fmt.Println("twas an error")
}
finalMetrics = append(finalMetrics, metrics)
}
And that's it! You have now fetched JSON files from a certain bucket and parsed the results into a struct
. In my opinion, especially in machine learning, fetching the contents of an S3 file is hugely important as engineers we are constantly wanting to see and compare for example past models' performance or fetching additional data features to append to our models.
Posted on May 26, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.