Convert CSV to DynamoDB Using Lambda - (Let's Build 🏗️ Series)

awedis

awedis

Posted on December 26, 2023

Convert CSV to DynamoDB Using Lambda - (Let's Build 🏗️ Series)

Have you ever needed to convert a CSV file to actual data and store it in a database? well, this article is for you!

We are going to build a simple architecture that reads CSV files from an API, converts the file to data, and stores it inside Amazon DynamoDB.

The main parts of this article:
1- Architecture overview (Terraform)
2- About AWS Services
3- Technical Part (code)
4- Result
5- Conclusion

Architecture Overview

To make our IaC more organized I'll split the code into multiple files:

  • variables.tf
  • main.tf
  • lambda.tf
  • iam.lambda.tf
  • apigateway.tf
  • dynamodb.tf

variables.tf (The variables that will be used in our IaC)

variable "aws_account_id" {
  default     = "<<YOUR_AWS_ACCOUNT_ID>>"
  description = "AWS Account ID"
}

variable "region" {
  default     = "eu-west-1"
  description = "AWS Region"
}
Enter fullscreen mode Exit fullscreen mode

main.tf

terraform {
  required_version = "1.5.1"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "5.22.0"
    }
  }
}

provider "aws" {
  region = var.region
}
Enter fullscreen mode Exit fullscreen mode

lambda.tf (We can see our Lambda function has permission to put logs to CloudWatch also put items to DynamoDB)

resource "aws_cloudwatch_log_group" "log_group" {
  name              = "/aws/lambda/${aws_lambda_function.excel_lambda.function_name}"
  retention_in_days = 7
  lifecycle {
    prevent_destroy = false
  }
}

resource "aws_iam_policy" "excel_function_policy" {
  name   = "excel-function-policy"
  policy = jsonencode({
    "Version" : "2012-10-17",
    "Statement" : [
      {
        Effect: "Allow",
        Action: [
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ],
        Resource: "arn:aws:logs:*:*:*"
      },
      {
        Effect: "Allow",
        Action: [
          "dynamodb:PutItem"
        ],
        Resource: "arn:aws:dynamodb:eu-west-1:${var.aws_account_id}:table/ExcelTable"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "excel_function_policy_attachment" {
  role = aws_iam_role.excel_lambda_role.id
  policy_arn = aws_iam_policy.excel_function_policy.arn
}

resource "aws_lambda_function" "excel_lambda" {
  filename              = "./main.zip"
  function_name         = "ExcelFunction"
  handler               = "main"
  runtime               = "go1.x"
  role                  = aws_iam_role.excel_lambda_role.arn
  memory_size           = "128"
  timeout               = "3"
  source_code_hash      = filebase64sha256("./main.zip")

  environment {
    variables = {
      DYNAMODB_TABLE_NAME = aws_dynamodb_table.excel_table.name
    }
  }
}

resource "aws_lambda_permission" "allow_api" {
  statement_id  = "AllowAPIgatewayInvokation"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.excel_lambda.function_name
  principal     = "apigateway.amazonaws.com"
}
Enter fullscreen mode Exit fullscreen mode

iam.lambda.tf (Here we create the STS assume role, so that our Lambda functions are allowed to assume role)

resource "aws_iam_role" "excel_lambda_role" {
  name = "excel_lambda_role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Action = "sts:AssumeRole",
      Effect = "Allow",
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}
Enter fullscreen mode Exit fullscreen mode

apigateway.tf (We create one simple API that accepts POST method and uses /upload/excel path")

resource "aws_apigatewayv2_api" "main" {
  name          = "main"
  protocol_type = "HTTP"
}

resource "aws_apigatewayv2_integration" "excel_lambda_integration" {
  api_id           = aws_apigatewayv2_api.main.id
  integration_type = "AWS_PROXY"

  connection_type           = "INTERNET"
  description               = "Excel Lambda"
  integration_method        = "POST"
  integration_uri           = aws_lambda_function.excel_lambda.invoke_arn
  passthrough_behavior      = "WHEN_NO_MATCH"
}

resource "aws_apigatewayv2_route" "excel_lambda_route" {
  api_id    = aws_apigatewayv2_api.main.id
  route_key = "POST /upload/excel"

  target = "integrations/${aws_apigatewayv2_integration.excel_lambda_integration.id}"
}
Enter fullscreen mode Exit fullscreen mode

dynamodb.tf (As we can see our database is simple, we will save two columns for now, but since we are using a No-SQL database the structure is flexible)

resource "aws_dynamodb_table" "excel_table" {
  name           = "ExcelTable"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "id"
  attribute {
    name = "id"
    type = "S"
  }
}
Enter fullscreen mode Exit fullscreen mode

About AWS Services

1- AWS Lambda: Which holds the code and the business logic
2- AWS IAM: For all the permissions inside the AWS cloud
3- Amazon DynamoDB: Key-Value database to store our data
4- Amazon API Gateway: Our API endpoint

Technical Part

Let us see our GO code now. 😎

package main

import (
    "context"
    "fmt"
    "encoding/base64"
    "github.com/aws/aws-lambda-go/events"
    "github.com/aws/aws-lambda-go/lambda"
    "net/http"
    "github.com/tealeg/xlsx"
    "strings"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/service/dynamodb"
)

var dynamoDBClient *dynamodb.DynamoDB

func init() {
    region := "eu-west-1"
    config := &aws.Config{
        Region: aws.String(region),
    }
    dynamoDBClient = dynamodb.New(session.Must(session.NewSession()), config)
}

func handler(ctx context.Context, request events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
    fileContent := request.Body

    decodedContent, err := base64.StdEncoding.DecodeString(request.Body)
    if err != nil {
        return events.APIGatewayProxyResponse{
            StatusCode: http.StatusBadRequest,
            Body:       "Failed to decode base64 file content",
        }, nil
    }
    fileContent = string(decodedContent)

    xlFile, err := xlsx.OpenBinary([]byte(fileContent))
    if err != nil {
        return events.APIGatewayProxyResponse{
            StatusCode: http.StatusBadRequest,
            Body:       "Failed to open Excel file",
        }, nil
    }

    var rows []string
    for _, sheet := range xlFile.Sheets {
        for _, row := range sheet.Rows {
            var cells []string
            for _, cell := range row.Cells {
                text := cell.String()
                cells = append(cells, text)
            }

            rows = append(rows, strings.Join(cells, "\t"))
            err := saveRowToDynamoDB(cells)
            if err != nil {
                return events.APIGatewayProxyResponse{
                    StatusCode: http.StatusInternalServerError,
                    Body:       fmt.Sprintf("Failed to save row to DynamoDB: %v", err),
                }, nil
            }
        }
    }

    rowsString := strings.Join(rows, "\n")

    return events.APIGatewayProxyResponse{
        StatusCode: http.StatusOK,
        Body:       fmt.Sprintf("Rows uploaded:\n%s", rowsString),
    }, nil
}

func saveRowToDynamoDB(cells []string) error {
    tableName := "ExcelTable"
    primaryKey := "id"

    item := map[string]*dynamodb.AttributeValue{
        primaryKey: {
            S: aws.String(cells[0]),
        },
        "Column1": {
            S: aws.String(cells[1]),
        },
    }

    _, err := dynamoDBClient.PutItem(&dynamodb.PutItemInput{
        Item:      item,
        TableName: aws.String(tableName),
    })
    return err
}

func main() {
    lambda.Start(handler)
}
Enter fullscreen mode Exit fullscreen mode

Let's go over the code quickly, so the init() function initializes the aws sdk configuration, and then we have the handler() function which first reads the request body and then decodes the content. Later we use xlsx.OpenBinary to read the Excel file content, loop over the sheets and the rows, and take the needed data out from it. After that, we have saveRowToDynamoDB() which queries our DynamoDB and puts the needed data to it.

📋 Note: Regarding the primary key it's highly recommended to use a unique generated string using some libraries. But for our case to keep things simple we are using the first column.

To build the code, we will run the following command:

GOARCH=amd64 GOOS=linux go build -o main
Enter fullscreen mode Exit fullscreen mode

Result

First let's hit our API using Postman, make sure you upload the file the same way I do.

Image description

As we can see the Content-Type key with the value multipart/form-data

Now we need to pass the file using form-data, with the file field
Image description

Our Excel file test.xlsx
Image description

Now finally we see the result inside the DynamoDB
Image description

Conclusion

Converting Excel data to a real backend database is one of the most needed features in many companies, it can be used in several places. In this article, we saw how we can build this architecture using some AWS services.

If you did like my content, and want to see more, feel free to connect with me on 👤➡️ Awedis LinkedIn, happy to guide or help anything that needs clarification 😊💁

💖 💪 🙅 🚩
awedis
awedis

Posted on December 26, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related