Convert CSV to DynamoDB Using Lambda - (Let's Build 🏗️ Series)
awedis
Posted on December 26, 2023
Have you ever needed to convert a CSV file to actual data and store it in a database? well, this article is for you!
We are going to build a simple architecture that reads CSV files from an API, converts the file to data, and stores it inside Amazon DynamoDB.
The main parts of this article:
1- Architecture overview (Terraform)
2- About AWS Services
3- Technical Part (code)
4- Result
5- Conclusion
Architecture Overview
To make our IaC more organized I'll split the code into multiple files:
variables.tf
main.tf
lambda.tf
iam.lambda.tf
apigateway.tf
dynamodb.tf
variables.tf (The variables that will be used in our IaC)
variable "aws_account_id" {
default = "<<YOUR_AWS_ACCOUNT_ID>>"
description = "AWS Account ID"
}
variable "region" {
default = "eu-west-1"
description = "AWS Region"
}
main.tf
terraform {
required_version = "1.5.1"
required_providers {
aws = {
source = "hashicorp/aws"
version = "5.22.0"
}
}
}
provider "aws" {
region = var.region
}
lambda.tf (We can see our Lambda function has permission to put logs to CloudWatch also put items to DynamoDB)
resource "aws_cloudwatch_log_group" "log_group" {
name = "/aws/lambda/${aws_lambda_function.excel_lambda.function_name}"
retention_in_days = 7
lifecycle {
prevent_destroy = false
}
}
resource "aws_iam_policy" "excel_function_policy" {
name = "excel-function-policy"
policy = jsonencode({
"Version" : "2012-10-17",
"Statement" : [
{
Effect: "Allow",
Action: [
"logs:CreateLogStream",
"logs:PutLogEvents"
],
Resource: "arn:aws:logs:*:*:*"
},
{
Effect: "Allow",
Action: [
"dynamodb:PutItem"
],
Resource: "arn:aws:dynamodb:eu-west-1:${var.aws_account_id}:table/ExcelTable"
}
]
})
}
resource "aws_iam_role_policy_attachment" "excel_function_policy_attachment" {
role = aws_iam_role.excel_lambda_role.id
policy_arn = aws_iam_policy.excel_function_policy.arn
}
resource "aws_lambda_function" "excel_lambda" {
filename = "./main.zip"
function_name = "ExcelFunction"
handler = "main"
runtime = "go1.x"
role = aws_iam_role.excel_lambda_role.arn
memory_size = "128"
timeout = "3"
source_code_hash = filebase64sha256("./main.zip")
environment {
variables = {
DYNAMODB_TABLE_NAME = aws_dynamodb_table.excel_table.name
}
}
}
resource "aws_lambda_permission" "allow_api" {
statement_id = "AllowAPIgatewayInvokation"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.excel_lambda.function_name
principal = "apigateway.amazonaws.com"
}
iam.lambda.tf (Here we create the STS assume role, so that our Lambda functions are allowed to assume role)
resource "aws_iam_role" "excel_lambda_role" {
name = "excel_lambda_role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Action = "sts:AssumeRole",
Effect = "Allow",
Principal = {
Service = "lambda.amazonaws.com"
}
}]
})
}
apigateway.tf (We create one simple API that accepts POST method and uses /upload/excel
path")
resource "aws_apigatewayv2_api" "main" {
name = "main"
protocol_type = "HTTP"
}
resource "aws_apigatewayv2_integration" "excel_lambda_integration" {
api_id = aws_apigatewayv2_api.main.id
integration_type = "AWS_PROXY"
connection_type = "INTERNET"
description = "Excel Lambda"
integration_method = "POST"
integration_uri = aws_lambda_function.excel_lambda.invoke_arn
passthrough_behavior = "WHEN_NO_MATCH"
}
resource "aws_apigatewayv2_route" "excel_lambda_route" {
api_id = aws_apigatewayv2_api.main.id
route_key = "POST /upload/excel"
target = "integrations/${aws_apigatewayv2_integration.excel_lambda_integration.id}"
}
dynamodb.tf (As we can see our database is simple, we will save two columns for now, but since we are using a No-SQL database the structure is flexible)
resource "aws_dynamodb_table" "excel_table" {
name = "ExcelTable"
billing_mode = "PAY_PER_REQUEST"
hash_key = "id"
attribute {
name = "id"
type = "S"
}
}
About AWS Services
1- AWS Lambda: Which holds the code and the business logic
2- AWS IAM: For all the permissions inside the AWS cloud
3- Amazon DynamoDB: Key-Value database to store our data
4- Amazon API Gateway: Our API endpoint
Technical Part
Let us see our GO code now. 😎
package main
import (
"context"
"fmt"
"encoding/base64"
"github.com/aws/aws-lambda-go/events"
"github.com/aws/aws-lambda-go/lambda"
"net/http"
"github.com/tealeg/xlsx"
"strings"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/service/dynamodb"
)
var dynamoDBClient *dynamodb.DynamoDB
func init() {
region := "eu-west-1"
config := &aws.Config{
Region: aws.String(region),
}
dynamoDBClient = dynamodb.New(session.Must(session.NewSession()), config)
}
func handler(ctx context.Context, request events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
fileContent := request.Body
decodedContent, err := base64.StdEncoding.DecodeString(request.Body)
if err != nil {
return events.APIGatewayProxyResponse{
StatusCode: http.StatusBadRequest,
Body: "Failed to decode base64 file content",
}, nil
}
fileContent = string(decodedContent)
xlFile, err := xlsx.OpenBinary([]byte(fileContent))
if err != nil {
return events.APIGatewayProxyResponse{
StatusCode: http.StatusBadRequest,
Body: "Failed to open Excel file",
}, nil
}
var rows []string
for _, sheet := range xlFile.Sheets {
for _, row := range sheet.Rows {
var cells []string
for _, cell := range row.Cells {
text := cell.String()
cells = append(cells, text)
}
rows = append(rows, strings.Join(cells, "\t"))
err := saveRowToDynamoDB(cells)
if err != nil {
return events.APIGatewayProxyResponse{
StatusCode: http.StatusInternalServerError,
Body: fmt.Sprintf("Failed to save row to DynamoDB: %v", err),
}, nil
}
}
}
rowsString := strings.Join(rows, "\n")
return events.APIGatewayProxyResponse{
StatusCode: http.StatusOK,
Body: fmt.Sprintf("Rows uploaded:\n%s", rowsString),
}, nil
}
func saveRowToDynamoDB(cells []string) error {
tableName := "ExcelTable"
primaryKey := "id"
item := map[string]*dynamodb.AttributeValue{
primaryKey: {
S: aws.String(cells[0]),
},
"Column1": {
S: aws.String(cells[1]),
},
}
_, err := dynamoDBClient.PutItem(&dynamodb.PutItemInput{
Item: item,
TableName: aws.String(tableName),
})
return err
}
func main() {
lambda.Start(handler)
}
Let's go over the code quickly, so the init()
function initializes the aws sdk configuration, and then we have the handler()
function which first reads the request body and then decodes the content. Later we use xlsx.OpenBinary
to read the Excel file content, loop over the sheets and the rows, and take the needed data out from it. After that, we have saveRowToDynamoDB()
which queries our DynamoDB and puts the needed data to it.
📋 Note: Regarding the primary key it's highly recommended to use a unique generated string using some libraries. But for our case to keep things simple we are using the first column.
To build the code, we will run the following command:
GOARCH=amd64 GOOS=linux go build -o main
Result
First let's hit our API using Postman, make sure you upload the file the same way I do.
As we can see the Content-Type
key with the value multipart/form-data
Now we need to pass the file using form-data, with the file
field
Now finally we see the result inside the DynamoDB
Conclusion
Converting Excel data to a real backend database is one of the most needed features in many companies, it can be used in several places. In this article, we saw how we can build this architecture using some AWS services.
If you did like my content, and want to see more, feel free to connect with me on 👤➡️ Awedis LinkedIn, happy to guide or help anything that needs clarification 😊💁
Posted on December 26, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 12, 2023
October 24, 2023