Using Serverless to Scan Files with ClamAV in a Lambda Container
Joseph Sutton
Posted on August 19, 2021
In my previous post, I had used a lambda function with a designated lambda layer. The layer's binaries were created within a Docker image based from Amazon's amazonlinux:2
image. We can use those binaries in conjunction with AWS's lambda container images feature without much worry of deployment size limitations as we did with the lambda function and its layer.
History
For those that did not read the previous post, this is going to establish an S3 bucket with an event trigger towards a lambda function. This lambda function will be a container with the handler code and ClamAV binaries and virus definitions. It will get the S3 object via the metadata in the trigger, scan it, and mark it as clean or dirty per the results of the ClamAV scan.
TLDR: Here's the GitHub repository.
Infrastructure
This is obviously going to be different - instead of using a lambda layer, we'll be using a Docker image stored on ECR. This is nearly effortless, thanks to Serverless.
Serverless
By default, Server will create an ECR repository for us and the image will live in it. All we have to do is give it the path of the Dockerfile
.
service: clambda-av
provider:
name: aws
runtime: nodejs14.x
ecr:
images:
clambdaAv:
path: ./
iamRoleStatements:
- Effect: Allow
Action:
- s3:GetObject
- s3:PutObjectTagging
Resource: "arn:aws:s3:::clambda-av-files/*"
functions:
virusScan:
image:
name: clambdaAv
memorySize: 2048
events:
- s3:
bucket: clambda-av-files
event: s3:ObjectCreated:*
timeout: 120
package:
exclude:
- node_modules/**
- coverage/**
Dockerfile
Since we're using Javascript, we'll be using the nodejs14 image
as the base. Unfortunately, we cannot easily install our ClamAV binaries through this image and thus have to use the amazonlinux:2
image, as stated above. Fortunately, Docker allows us do that with ease via multi-stage Docker builds. I've never done this up until now, but it was a pretty quick and interesting process:
FROM amazonlinux:2 AS layer-image
WORKDIR /home/build
RUN set -e
RUN echo "Prepping ClamAV"
RUN rm -rf bin
RUN rm -rf lib
RUN yum update -y
RUN amazon-linux-extras install epel -y
RUN yum install -y cpio yum-utils tar.x86_64 gzip zip
RUN yumdownloader -x \*i686 --archlist=x86_64 clamav
RUN rpm2cpio clamav-0*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-lib
RUN rpm2cpio clamav-lib*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 clamav-update
RUN rpm2cpio clamav-update*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 json-c
RUN rpm2cpio json-c*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 pcre2
RUN rpm2cpio pcre*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 libtool-ltdl
RUN rpm2cpio libtool-ltdl*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 libxml2
RUN rpm2cpio libxml2*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 bzip2-libs
RUN rpm2cpio bzip2-libs*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 xz-libs
RUN rpm2cpio xz-libs*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 libprelude
RUN rpm2cpio libprelude*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 gnutls
RUN rpm2cpio gnutls*.rpm | cpio -vimd
RUN yumdownloader -x \*i686 --archlist=x86_64 nettle
RUN rpm2cpio nettle*.rpm | cpio -vimd
RUN mkdir -p bin
RUN mkdir -p lib
RUN mkdir -p var/lib/clamav
RUN chmod -R 777 var/lib/clamav
COPY ./freshclam.conf .
RUN cp usr/bin/clamscan usr/bin/freshclam bin/.
RUN cp usr/lib64/* lib/.
RUN cp freshclam.conf bin/freshclam.conf
RUN yum install shadow-utils.x86_64 -y
RUN groupadd clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamav
RUN useradd -g clamav -s /bin/false -c "Clam Antivirus" clamupdate
RUN LD_LIBRARY_PATH=./lib ./bin/freshclam --config-file=bin/freshclam.conf
FROM public.ecr.aws/lambda/nodejs:14
COPY --from=layer-image /home/build ./
COPY handler.js ./
CMD ["handler.virusScan"]
This Dockerfile does two things:
- Builds the ClamAV binaries into a stage aliased
layer-image
along with the ClamAV virus definitions - Builds the Lambda image with the handler itself, and then pulls in the ClamAV binaries and virus definitions from the
layer-image
stage
Handler
This doesn't change the handler much from my previous post:
const { execSync } = require("child_process");
const { writeFileSync, unlinkSync } = require("fs");
const AWS = require("aws-sdk");
const s3 = new AWS.S3();
module.exports.virusScan = async (event, context) => {
if (!event.Records) {
console.log("Not an S3 event invocation!");
return;
}
for (const record of event.Records) {
if (!record.s3) {
console.log("Not an S3 Record!");
continue;
}
// get the file
const s3Object = await s3
.getObject({
Bucket: record.s3.bucket.name,
Key: record.s3.object.key
})
.promise();
// write file to disk
writeFileSync(`/tmp/${record.s3.object.key}`, s3Object.Body);
try {
// scan it
execSync(`./bin/clamscan --database=./var/lib/clamav /tmp/${record.s3.object.key}`);
await s3
.putObjectTagging({
Bucket: record.s3.bucket.name,
Key: record.s3.object.key,
Tagging: {
TagSet: [
{
Key: 'av-status',
Value: 'clean'
}
]
}
})
.promise();
} catch(err) {
if (err.status === 1) {
// tag as dirty, OR you can delete it
await s3
.putObjectTagging({
Bucket: record.s3.bucket.name,
Key: record.s3.object.key,
Tagging: {
TagSet: [
{
Key: 'av-status',
Value: 'dirty'
}
]
}
})
.promise();
}
}
// delete the temp file
unlinkSync(`/tmp/${record.s3.object.key}`);
}
};
Summary
From our previous adventure (this is the last time I'm linking it, I swear), this removes the extra step of building the binaries with a bash script. It also removes the need for a lambda layer.
If you'd like to check out the full code, again, it's in the GitHub repository. Please do not hesitate to ask questions or post any comments or issues you may have in this article or by opening an issue on the repository if applicable. Thanks for reading!
Posted on August 19, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.