Fetching and reading files from S3 using Go 🔥👀

seanyboi

Sean O'Connor

Posted on May 26, 2021

Fetching and reading files from S3 using Go 🔥👀

Trying to figure out how to do simple tasks using the AWS SDK for particular services can be difficult given that sometimes the AWS documentation is limited and gives you the bare minimum. Today I'll show you how to fetch and read particular files from S3 using Go. This tutorial collates many hours of research into what should be a simple problem.

Prerequisites include:

  • Go installed / previous experience with Go.
  • AWS-SDK set up / previous development with AWS-SDK.

Basic imports

import (
    "encoding/json"
    "fmt"
    "io/ioutil"
    "log"

    "github.com/aws/aws-lambda-go/lambda"
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/s3"
)
Enter fullscreen mode Exit fullscreen mode

Defining global variables and structs.

Start off by defining some basic structs and global variables.


type S3Bucket struct {
    Bucket string `json:"bucket"`
    Key    string `json:"key"`
}

type Metrics struct {
    RMSE         string      `json:"rmse"`
    MAE        string      `json:"mae"`
    MAPE        string      `json:"mape"`
}

var pageNum int = 0
var s3Buckets []S3Bucket
var finalMetrics []Metrics
var sess *session.Session
Enter fullscreen mode Exit fullscreen mode

Initiating a session.

Firstly we initialise a session that the SDK uses to load credentials from the shared credentials file ~/.aws/credentials, and create a new Amazon S3 service client.

sess, err := session.NewSession(&aws.Config{
        Region: aws.String(conf.AWS_REGION),
    })

if err != nil {
    exitErrorf("Unable to create a new session %v", err)
}
Enter fullscreen mode Exit fullscreen mode

Listing items in a bucket with pagination.

The AWS docs only give an example of accessing a bucket's files using ListObjectsV2 function. Now the problem I encountered with this function it does not allow us to apply our own custom function to the results in order for us to filter them even more. Another problem is it returns (up to 1,000) of the objects in a bucket with each request. This includes sub-paths to the files you wish to read.

ListObjectsV2 lists all objects in our S3 bucket tree, even objects that do not contain files. If I want to target certain objects we have to apply a function. So, instead we'll use ListObjectsV2Pages. ListObjectsV2Pages iterates over the pages of a ListObjectsV2 operation, calling the function with the response data for each page. To stop iterating, we return false.

As shown below I wish to target only the .json files in the page and append them to an s3Bucket slice. This part is important as it will allow us to know the location of each file so we can then access the contents!

We pass our main bucket name as S3_BUCKET and our object path if there is one into S3_PREFIX.

svc := s3.New(sess)
err = svc.ListObjectsV2Pages(&s3.ListObjectsV2Input{Bucket: aws.String(S3_BUCKET), Prefix: aws.String(S3_PREFIX)},
    func(page *s3.ListObjectsV2Output, lastPage bool) bool {
        pageNum++
        for _, item := range page.Contents {
            if strings.Contains(*item.Key, "json") {
                s3Buckets = append(s3Buckets, S3Bucket{Bucket: conf.S3_BUCKET, Key: *item.Key})
            }
        }
        return pageNum < 100
    })

if err != nil {
    exitErrorf("Unable to list items in bucket %q, %v", conf.S3_BUCKET, err)
}
Enter fullscreen mode Exit fullscreen mode

Accessing the object contents.

Using the s3buckets slice, we will access the Bucket and Key from the struct and request the 'Object' information (or in other words the file) and then fetch the object based on the object information.

for _, item := range s3Buckets {
    requestInput := &s3.GetObjectInput{
        Bucket: aws.String(item.Bucket),
        Key:    aws.String(item.Key),
    }

    result, err := svc.GetObject(requestInput)
    if err != nil {
        log.Print(err)
    }
Enter fullscreen mode Exit fullscreen mode

Reading the contents into slice

The JSON file 'result' is read with the ioutil.Readall()function, which returns a byte slice that is decoded into the Metrics struct instance using the json.Unmarshal() function.

The best tutorial I have found regarding reading JSON into a struct is this one: Parsing JSON

    defer result.Body.Close()
    body, err := ioutil.ReadAll(result.Body)
    if err != nil {
        log.Print(err)
    }

    bodyString := fmt.Sprintf("%s", body)
    var metrics Metrics
    err = json.Unmarshal([]byte(bodyString), &metrics)

    if err != nil {
        fmt.Println("twas an error")
    }

    finalMetrics = append(finalMetrics, metrics)

}
Enter fullscreen mode Exit fullscreen mode

And that's it! You have now fetched JSON files from a certain bucket and parsed the results into a struct. In my opinion, especially in machine learning, fetching the contents of an S3 file is hugely important as engineers we are constantly wanting to see and compare for example past models' performance or fetching additional data features to append to our models.

💖 💪 🙅 🚩
seanyboi
Sean O'Connor

Posted on May 26, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related