Using Serverless to Scan Files with ClamAV in a Lambda Container

In my previous post, I had used a lambda function with a designated lambda layer. The layer's binaries were created within a Docker image based from Amazon's amazonlinux:2 image. We can use those binaries in conjunction with AWS's lambda container images feature without much worry of deployment size limitations as we did with the lambda function and its layer.

History

For those that did not read the previous post, this is going to establish an S3 bucket with an event trigger towards a lambda function. This lambda function will be a container with the handler code and ClamAV binaries and virus definitions. It will get the S3 object via the metadata in the trigger, scan it, and mark it as clean or dirty per the results of the ClamAV scan.

TLDR: Here's the GitHub repository.

Infrastructure

This is obviously going to be different - instead of using a lambda layer, we'll be using a Docker image stored on ECR. This is nearly effortless, thanks to Serverless.

Serverless

By default, Server will create an ECR repository for us and the image will live in it. All we have to do is give it the path of the Dockerfile.

service: clambda-av

provider:
  name: aws
  runtime: nodejs14.x
  ecr:
    images:
      clambdaAv:
        path: ./
  iamRoleStatements:
    - Effect: Allow
      Action:
        - s3:GetObject
        - s3:PutObjectTagging
      Resource: "arn:aws:s3:::clambda-av-files/*"

functions:
  virusScan:
    image:
      name: clambdaAv
    memorySize: 2048
    events:
      - s3: 
          bucket: clambda-av-files
          event: s3:ObjectCreated:*
    timeout: 120

package:
  exclude:
    - node_modules/**
    - coverage/**

Dockerfile

Since we're using Javascript, we'll be using the nodejs14 image as the base. Unfortunately, we cannot easily install our ClamAV binaries through this image and thus have to use the amazonlinux:2 image, as stated above. Fortunately, Docker allows us do that with ease via multi-stage Docker builds. I've never done this up until now, but it was a pretty quick and interesting process:

FROM amazonlinux:2 AS layer-image

WORKDIR /home/build

RUN set -e

RUN echo "Prepping ClamAV"

RUN rm -rf bin
RUN rm -rf lib

RUN yum update -y
RUN amazon-linux-extras install epel -y
RUN yum install -y cpio yum-utils tar.x86_64 gzip zip

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav
RUN rpm2cpio clamav-0*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-lib
RUN rpm2cpio clamav-lib*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-update
RUN rpm2cpio clamav-update*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 json-c
RUN rpm2cpio json-c*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 pcre2
RUN rpm2cpio pcre*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libtool-ltdl
RUN rpm2cpio libtool-ltdl*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libxml2
RUN rpm2cpio libxml2*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 bzip2-libs
RUN rpm2cpio bzip2-libs*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 xz-libs
RUN rpm2cpio xz-libs*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libprelude
RUN rpm2cpio libprelude*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 gnutls
RUN rpm2cpio gnutls*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 nettle
RUN rpm2cpio nettle*.rpm | cpio -vimd

RUN mkdir -p bin
RUN mkdir -p lib
RUN mkdir -p var/lib/clamav
RUN chmod -R 777 var/lib/clamav

COPY ./freshclam.conf .

RUN cp usr/bin/clamscan usr/bin/freshclam bin/.
RUN cp usr/lib64/* lib/.
RUN cp freshclam.conf bin/freshclam.conf

RUN yum install shadow-utils.x86_64 -y

RUN groupadd clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamupdate

RUN LD_LIBRARY_PATH=./lib ./bin/freshclam --config-file=bin/freshclam.conf

FROM public.ecr.aws/lambda/nodejs:14

COPY --from=layer-image /home/build ./

COPY handler.js ./

CMD ["handler.virusScan"]

This Dockerfile does two things:

Builds the ClamAV binaries into a stage aliased layer-image along with the ClamAV virus definitions
Builds the Lambda image with the handler itself, and then pulls in the ClamAV binaries and virus definitions from the layer-image stage

Handler

This doesn't change the handler much from my previous post:

const { execSync } = require("child_process");
const { writeFileSync, unlinkSync } = require("fs");
const AWS = require("aws-sdk");

const s3 = new AWS.S3();

module.exports.virusScan = async (event, context) => {
  if (!event.Records) {
    console.log("Not an S3 event invocation!");
    return;
  }

  for (const record of event.Records) {
    if (!record.s3) {
      console.log("Not an S3 Record!");
      continue;
    }

    // get the file
    const s3Object = await s3
      .getObject({
        Bucket: record.s3.bucket.name,
        Key: record.s3.object.key
      })
      .promise();

    // write file to disk
    writeFileSync(`/tmp/${record.s3.object.key}`, s3Object.Body);

    try { 
      // scan it
      execSync(`./bin/clamscan --database=./var/lib/clamav /tmp/${record.s3.object.key}`);

      await s3
        .putObjectTagging({
          Bucket: record.s3.bucket.name,
          Key: record.s3.object.key,
          Tagging: {
            TagSet: [
              {
                Key: 'av-status',
                Value: 'clean'
              }
            ]
          }
        })
        .promise();
    } catch(err) {
      if (err.status === 1) {
        // tag as dirty, OR you can delete it
        await s3
          .putObjectTagging({
            Bucket: record.s3.bucket.name,
            Key: record.s3.object.key,
            Tagging: {
              TagSet: [
                {
                  Key: 'av-status',
                  Value: 'dirty'
                }
              ]
            }
          })
          .promise();
      }
    }

    // delete the temp file
    unlinkSync(`/tmp/${record.s3.object.key}`);
  }
};

Summary

From our previous adventure (this is the last time I'm linking it, I swear), this removes the extra step of building the binaries with a bash script. It also removes the need for a lambda layer.

If you'd like to check out the full code, again, it's in the GitHub repository. Please do not hesitate to ask questions or post any comments or issues you may have in this article or by opening an issue on the repository if applicable. Thanks for reading!

Blog