Efficiently Zipping Files on Amazon S3 with Node.js

Introduction:
When it comes to providing users with the ability to download multiple files from Amazon S3 in a single package, zipping those files is a common requirement. However, in a serverless environment such as AWS Lambda, there are storage and memory constraints that need to be considered. In this article, we will explore how to efficiently zip files stored on Amazon S3 using Node.js, while overcoming these limitations. We will leverage readable and writable streams, along with the archiver library, to optimize memory usage and storage space.

Prerequisites:
Before we dive into the implementation, make sure you have the following prerequisites in place:

An AWS account with access to the S3 service
Node.js and npm (Node Package Manager) installed on your machine
Basic knowledge of JavaScript and AWS concepts

Step 1: Configuring AWS and Dependencies:
To get started install the necessary dependencies. Create a new Node.js project, and install the following packages using npm:

@aws-sdk/client-s3
@aws-sdk/lib-storage
@aws-sdk/s3-request-presigner
archiver

To configure our AWS credentials provide the following configuration either in the file itself or in a separate config.js file.

const AWS_S3_BUCKET = "<aws_s3_bucket>"

const AWS_CONFIG = {
  credentials: {
    accessKeyId: "<aws_access_key_id>",
    secretAccessKey: "<aws_secret_access_key>",
  },
  region: "<aws_s3_region>",
}

Step 2: Streaming Data from S3:
We begin by creating a function called getReadableStreamFromS3, which takes an S3 key as input. This function uses the GetObjectCommand utility from the @aws-sdk/client-s3 library to fetch the file from S3 and returns the file as a readable stream. By utilizing streams, we avoid storing the entire file in memory.

async function getReadableStreamFromS3(s3Key: string) {
  const client = new S3Client(AWS_CONFIG);
  const command = new GetObjectCommand({
    Bucket: AWS_S3_BUCKET,
    Key: s3Key,
  });
  const response = await client.send(command);
  return response.Body;
}

Step 3: Uploading Zipped Data to S3:
Next, we create a function called getWritableStreamFromS3, which takes a destination S3 key for the zipped file as input. This function utilizes the Upload utility from the @aws-sdk/lib-storage library. Since the Upload function does not expose a writable stream directly, we employ a "passthrough stream" using the PassThrough object from the Node.js streams API. This object acts as a proxy for a writable stream and allows us to upload the zipped data to S3 efficiently.

function getWritableStreamFromS3(zipFileS3Key: string) {
  let _passthrough = new PassThrough();
  const s3 = new S3(AWS_CONFIG);

  new Upload({
    client: s3,
    params: {
      Bucket: AWS_S3_BUCKET,
      Key: zipFileS3Key,
      Body: _passthrough,
    },
  }).done();

  return _passthrough;
}

Step 4: Generating and Streaming Zip Files to S3:
In this step, we create a function called generateAndStreamZipfileToS3, which takes a list of S3 keys (s3KeyList) and the destination key for the uploaded zip file (zipFileS3Key). Inside this function, we use the archiver library to create a zip archive. We iterate through the s3KeyList, fetch each file as a readable stream using getReadableStreamFromS3, and append it to the zip archive. Then, we obtain the writable stream using getWritableStreamFromS3 and pipe the zip archive to it. Finally, we call zip.finalize() to initiate the zipping process.

async function generateAndStreamZipfileToS3(
  s3KeyList: [string],
  zipFileS3Key: string
) {
  try {
    let zip = archiver("zip");

    for (const s3Key of s3KeyList) {
      const s3ReadableStream = await getReadableStreamFromS3(s3Key);
      zip.append(<Readable>s3ReadableStream, { name: s3Key.split("/").pop()! });
    }

    const s3WritableStream = getWritableStreamFromS3(zipFileS3Key);
    zip.pipe(s3WritableStream);
    zip.finalize();

  } catch (error: any) {
    logger.error(`Error in generateAndStreamZipfileToS3 ::: ${error.message}`);
  }
}

Step 5: Serving the Zipped S3 File with a Presigned URL:
To provide secure access to the zipped file, we can generate a presigned URL with limited validity. In this optional step, we create a function called generatePresignedURLforZip, which takes the zipFileS3Key as input. Using the GetObjectCommand utility from the @aws-sdk/s3-request-presigner library, we generate a presigned URL that expires after 24 hours. This URL can be shared with users, allowing them to download the zipped file within the specified time frame.

export async function generatePresignedURLforZip(zipFileS3Key: string) {
  logger.info("Generating Presigned URL for the zip file.");
  const client = new S3Client(AWS_CONFIG);
  const command = new GetObjectCommand({
    Bucket: AWS_S3_BUCKET,
    Key: zipFileS3Key,
  });
  const signedUrl = await getSignedUrl(client, command, {
    expiresIn: 24 * 3600,
  });
  return signedUrl;
}

Conclusion:
By leveraging the power of readable and writable streams in combination with the archiver library, we can efficiently zip files stored on Amazon S3 in a serverless environment. This approach minimizes memory usage and storage constraints, enabling us to handle large files without overwhelming our resources. Additionally, by using presigned URLs, we can securely share the zipped files with users for a limited duration. Next time you need to provide users with a convenient way to download multiple files from S3, consider implementing this solution with Node.js.

Remember to handle errors and incorporate proper error handling in your actual implementation. Happy coding!

References:

AWS SDK for JavaScript documentation: https://docs.aws.amazon.com/AWSJavaScriptSDK/
Node.js Streams documentation: https://nodejs.org/api/stream.html
Archiver documentation: https://archiverjs.com/

Blog

Efficiently Zipping Files on Amazon S3 with Node.js

Rohan Sen Sharma

Join Our Newsletter. No Spam, Only the good stuff.

Related