Using Serverless to Scan Files with a ClamAV Lambda Layer

Update: I've written how to do this with lambda containers as well!

Let's create an environment that scans a file via an S3 event by utilizing ClamAV binaries on a Lambda layer. You can retrieve the full source code at this GitHub repository.

Note: As of 8/11/2021, the unzipped size of the ClamAV binaries and virus definitions are under the limit. This may change in the future.

Serverless config

So, we'll need a few things: an S3 bucket, a function, and a lambda layer. I've also included logging groups, and permissions in the serverless.yml file:



service: clambda-av

provider:
  name: aws
  runtime: nodejs14.x
  iamRoleStatements:
    - Effect: Allow
      Action:
        - s3:GetObject
        - s3:PutObjectTagging
      Resource: "arn:aws:s3:::clambda-av-files/*"

functions:
  virusScan:
    handler: handler.virusScan
    memorySize: 2048
    events:
      - s3: 
          bucket: clambda-av-files
          event: s3:ObjectCreated:*
    layers:
      - {Ref: ClamavLambdaLayer}
    timeout: 120

package:
  exclude:
    - node_modules/**
    - coverage/**

layers:
  clamav:
    path: layer

Dockerfile

Before we deploy this, we need to get our ClamAV binaries. Amazon has their own Docker base image we can use to build these binaries. With the base image, we can start making our binaries. Through the power of trial and error, I've found the necessary binaries that's required. Here's the full Dockerfile:



FROM amazonlinux:2

WORKDIR /home/build

RUN set -e

RUN echo "Prepping ClamAV"

RUN rm -rf bin
RUN rm -rf lib

RUN yum update -y
RUN amazon-linux-extras install epel -y
RUN yum install -y cpio yum-utils tar.x86_64 gzip zip

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav
RUN rpm2cpio clamav-0*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-lib
RUN rpm2cpio clamav-lib*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-update
RUN rpm2cpio clamav-update*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 json-c
RUN rpm2cpio json-c*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 pcre2
RUN rpm2cpio pcre*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libtool-ltdl
RUN rpm2cpio libtool-ltdl*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libxml2
RUN rpm2cpio libxml2*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 bzip2-libs
RUN rpm2cpio bzip2-libs*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 xz-libs
RUN rpm2cpio xz-libs*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 libprelude
RUN rpm2cpio libprelude*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 gnutls
RUN rpm2cpio gnutls*.rpm | cpio -vimd

RUN yumdownloader -x \*i686 --archlist=x86_64 nettle
RUN rpm2cpio nettle*.rpm | cpio -vimd

RUN mkdir -p bin
RUN mkdir -p lib
RUN mkdir -p var/lib/clamav
RUN chmod -R 777 var/lib/clamav

COPY ./freshclam.conf .

RUN cp usr/bin/clamscan usr/bin/freshclam bin/.
RUN cp usr/lib64/* lib/.
RUN cp freshclam.conf bin/freshclam.conf

RUN yum install shadow-utils.x86_64 -y

RUN groupadd clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamupdate

RUN LD_LIBRARY_PATH=./lib ./bin/freshclam --config-file=bin/freshclam.conf

RUN zip -r9 clamav_lambda_layer.zip bin
RUN zip -r9 clamav_lambda_layer.zip lib
RUN zip -r9 clamav_lambda_layer.zip var
RUN zip -r9 clamav_lambda_layer.zip etc

You'll note that the COPY ./freshclam.conf . line implies that there's another file that we need and you'd be correct:



DatabaseMirror database.clamav.net
CompressLocalDatabase yes
ScriptedUpdates no
DatabaseDirectory /home/build/var/lib/clamav

Building the binaries with Docker

Next is our bash script build.sh to run Docker on the Dockerfile above to build and extract the binaries:



#!/bin/bash

rm -rf ./layer
mkdir layer

docker build -t clamav -f Dockerfile .
docker run --name clamav clamav
docker cp clamav:/home/build/clamav_lambda_layer.zip .
docker rm clamav
mv clamav_lambda_layer.zip ./layer

pushd layer
unzip -n clamav_lambda_layer.zip
rm clamav_lambda_layer.zip
popd

Scanning Handler

Furthermore, we also need our handler to do the scanning and tagging of the file. The tagging is more of a placeholder. You can create a separate quarantine bucket or a separate clean bucket - whichever you prefer.

Now, we need our handler and we should be set:



const { execSync } = require("child_process");
const { writeFileSync, unlinkSync } = require("fs");
const AWS = require("aws-sdk");

const s3 = new AWS.S3();

module.exports.virusScan = async (event, context) => {
  if (!event.Records) {
    console.log("Not an S3 event invocation!");
    return;
  }

  for (const record of event.Records) {
    if (!record.s3) {
      console.log("Not an S3 Record!");
      continue;
    }

    // get the file
    const s3Object = await s3
      .getObject({
        Bucket: record.s3.bucket.name,
        Key: record.s3.object.key
      })
      .promise();

    // write file to disk
    writeFileSync(`/tmp/${record.s3.object.key}`, s3Object.Body);

    try { 
      // scan it
      const scanStatus = execSync(`clamscan --database=/opt/var/lib/clamav /tmp/${record.s3.object.key}`);

      await s3
        .putObjectTagging({
          Bucket: record.s3.bucket.name,
          Key: record.s3.object.key,
          Tagging: {
            TagSet: [
              {
                Key: 'av-status',
                Value: 'clean'
              }
            ]
          }
        })
        .promise();
    } catch(err) {
      if (err.status === 1) {
        // tag as dirty, OR you can delete it
        await s3
          .putObjectTagging({
            Bucket: record.s3.bucket.name,
            Key: record.s3.object.key,
            Tagging: {
              TagSet: [
                {
                  Key: 'av-status',
                  Value: 'dirty'
                }
              ]
            }
          })
          .promise();
      }
    }

    // delete the temp file
    unlinkSync(`/tmp/${record.s3.object.key}`);
  }
};

You may be asking yourself on the --database option above, "Wait... why /opt/?" That's because all of the layer files are placed into that directory for the lambda function to use once it's mounted at runtime.

Here's the algorithm:

If a file is clean, then it is tagged with a key/value pair of av-status = 'clean'
If a file is NOT clean (virus), then it is tagged with a key/value pair of av-status = 'dirty'

Since this is for demonstration purposes, you can obviously customize this flow however you'd like. 😀

Deploying

Now that we have all of our files, we can do the following:

Build the binaries via ./build.sh (make sure it's executable via chmod +x build.sh after creation)
Run sls deploy



joseph@bertha > sls deploy
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Excluding development dependencies...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
........
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service clambda-av.zip file to S3 (41 KB)...
Serverless: Uploading service clamav.zip file to S3 (222.57 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
........................
Serverless: Stack update finished...
Service Information
service: clambda-av
stage: dev
region: us-east-1
stack: clambda-av-dev
resources: 9
api keys:
  None
endpoints:
  None
functions:
  virusScan: clambda-av-dev-virusScan
layers:
  clamav: arn:aws:lambda:us-east-1:**********:layer:clamav:8

Now, let's test it by uploading a clean file to S3. I happen to have a PDF laying around:



joseph@bertha > aws s3 cp ~/document.pdf s3://clambda-av-files/
upload: ../../document.pdf to s3://clambda-av-files/document.pdf

The downside is that clamscan in the Lambda layer takes ~30 seconds or so boot, load the virus definitions, and scan the file with the results. We can check the tag after some time via:



joseph@bertha > aws s3api get-object-tagging --bucket clambda-av-files --key document.pdf

{
    "TagSet": [
        {
            "Key": "av-status",
            "Value": "clean"
        }
    ]
}

We can test the virus scanner with a test virus signature found at EICAR:



X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

After saving that text to a file called test-virus.pdf. Let's upload it and see what happens:



joseph@bertha > aws s3 cp ~/test-virus.pdf s3://clambda-av-files/
upload: ../../test-virus.pdf to s3://clambda-av-files/test-virus.pdf

After waiting another thirty seconds or so, let's check the tag on it:



joseph@bertha > aws s3api get-object-tagging --bucket clambda-av-files --key test-virus.pdf

{

    "TagSet": [

        {

            "Key": "av-status",

            "Value": "dirty"

        }

    ]

}

Drawbacks and potential problems

There are a few drawbacks to this, especially since the size of the Lambda layer is quite large:

File size constraints due to the /tmp means you cannot upload a file larger than 512MB.
ClamAV virus definitions will no doubt get larger, thus potentially interfering with the maximum deployment size (which is 250MB, including layers, as of 8/11/2021).
Lambda code storage limitation may eventually be reached with consecutive deployments, although this can be mitigated with the serverless-prune-plugin.

Thanks

Thank y'all for reading. If you have any questions, feel free to ask in the comments! I also welcome suggestions. 🙂

Blog