Rui Trigo
Posted on February 5, 2021
Docker builds images automatically by reading the instructions from a Dockerfile -- a text file that contains all commands, in order, needed to build a given image.
The explanation above was extracted from Docker’s official docs and summarizes what a Dockerfile is for. Dockerfiles are important to work with because they are our blueprint, our record of layers added to a Docker base image.
We will learn how to take advantage of BuildKit features, a set of enhancements introduced on Docker v18.09. Integrating BuildKit will give us better performance, storage management, and security.
Objectives
- decrease build time;
- reduce image size;
- gain maintainability;
- gain reproducibility;
- understand multi-stage Dockerfiles;
- understand BuildKit features.
Pre-requisites
- knowledge of Docker concepts
- Docker installed (currently using v19.03)
- a Java app (for this post I used a sample Jenkins Maven app)
Let's get to it!
Simple Dockerfile example
Below is an example of an unoptimized Dockerfile containing a Java app. This example was taken from this DockerCon conference talk. We will walk through several optimizations as we go.
FROM debian
COPY . /app
RUN apt-get update
RUN apt-get -y install openjdk-11-jdk ssh emacs
CMD [“java”, “-jar”, “/app/target/my-app-1.0-SNAPSHOT.jar”]
Here, we may ask ourselves: how long does it take to build at this stage? To answer it, let's create this Dockerfile on our local development computer and tell Docker to build the image.
# enter your Java app folder
cd simple-java-maven-app-master
# create a Dockerfile
vim Dockerfile
# write content, save and exit
docker pull debian:latest # pull the source image
time docker build --no-cache -t docker-class . # overwrite previous layers
# notice the build time
0,21s user 0,23s system 0% cpu 1:55,17 total
Here’s our answer: our build takes 1m55s at this point.
But what if we just enable BuildKit with no additional changes? Does it make a difference?
Enabling BuildKit
BuildKit can be enabled with two methods:
- Setting the DOCKER_BUILDKIT=1 environment variable when invoking the Docker build command, such as:
time DOCKER_BUILDKIT=1 docker build --no-cache -t docker-class .
- Enabling Docker BuildKit by default, setting the daemon configuration in the
/etc/docker/daemon.json
feature to true, and restarting the daemon:
{ "features": { "buildkit": true } }
BuildKit Initial Impact
DOCKER_BUILDKIT=1 docker build --no-cache -t docker-class .
0,54s user 0,93s system 1% cpu 1:43,00 total
On the same hardware, the build took ~12 seconds less than before. This means the build got ~10,43% faster with almost no effort.
But now let’s look at some extra steps we can take to improve our results even further.
Order from least to most frequently changing
Because order matters for caching, we'll move the COPY
command closer to the end of the Dockerfile.
FROM debian
RUN apt-get update
RUN apt-get -y install openjdk-11-jdk ssh emacs
RUN COPY . /app
CMD [“java”, “-jar”, “/app/target/my-app-1.0-SNAPSHOT.jar”]
Avoid "COPY ."
Opt for more specific COPY arguments to limit cache busts. Only copy what’s needed.
FROM debian
RUN apt-get update
RUN apt-get -y install openjdk-11-jdk ssh vim
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]
Couple apt-get update & install
This prevents using an outdated package cache. Cache them together or do not cache them at all.
FROM debian
RUN apt-get update && \
apt-get -y install openjdk-11-jdk ssh vim
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]
Remove unnecessary dependencies
Don’t install debugging and editing tools—you can install them later when you feel you need them.
FROM debian
RUN apt-get update && \
apt-get -y install --no-install-recommends \
openjdk-11-jdk
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]
Remove package manager cache
Your image does not need this cache data. Take the chance to free some space.
FROM debian
RUN apt-get update && \
apt-get -y install --no-install-recommends \
openjdk-11-jdk && \
rm -rf /var/lib/apt/lists/*
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]
Use official images where possible
There are some good reasons to use official images, such as reducing the time spent on maintenance and reducing the size, as well as having an image that is pre-configured for container use.
FROM openjdk
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]
Use specific tags
Don’t use latest
as it’s a rolling tag. That’s asking for unpredictable problems.
FROM openjdk:8
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]
Look for minimal flavors
You can reduce the base image size. Pick the lightest one that suits your purpose. Below is a short openjdk
images list.
Repository | Tag | Size |
---|---|---|
openjdk | 8 | 634MB |
openjdk | 8-jre | 443MB |
openjdk | 8-jre-slim | 204MB |
openjdk | 8-jre-alpine | 83MB |
Build from a source in a consistent environment
Maybe you do not need the whole JDK. If you intended to use JDK for Maven, you can use a Maven Docker image as a base for your build.
FROM maven:3.6-jdk-8-alpine
WORKDIR /app
COPY pom.xml .
COPY src ./src
RUN mvn -e -B package
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]
Fetch dependencies in a separate step
A Dockerfile command to fetch dependencies can be cached. Caching this step will speed up our builds.
FROM maven:3.6-jdk-8-alpine
WORKDIR /app
COPY pom.xml .
RUN mvn -e -B dependency:resolve
COPY src ./src
RUN mvn -e -B package
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]
Multi-stage builds: remove build dependencies
Why use multi-stage builds?
- separate the build from the runtime environment
- DRY
- different details on dev, test, lint specific environments
- delinearizing dependencies (concurrency)
- having platform-specific stages
FROM maven:3.6-jdk-8-alpine AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn -e -B dependency:resolve
COPY src ./src
RUN mvn -e -B package
FROM openjdk:8-jre-alpine
COPY --from=builder /app/target/my-app-1.0-SNAPSHOT.jar /
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]
Checkpoint
If you build our application at this point,
time DOCKER_BUILDKIT=1 docker build --no-cache -t docker-class .
0,41s user 0,54s system 2% cpu 35,656 total
you'll notice our application takes ~35.66 seconds to build. It's a pleasant improvement. From now on, we will focus on the features for more possible scenarios.
Multi-stage builds: different image flavors
The Dockerfile below shows a different stage for a Debian and an Alpine based image.
FROM maven:3.6-jdk-8-alpine AS builder
…
FROM openjdk:8-jre-jessie AS release-jessie
COPY --from=builder /app/target/my-app-1.0-SNAPSHOT.jar /
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]
FROM openjdk:8-jre-alpine AS release-alpine
COPY --from=builder /app/target/my-app-1.0-SNAPSHOT.jar /
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]
To build a specific image on a stage, we can use the --target
argument:
time docker build --no-cache --target release-jessie .
Different image flavors (DRY / global ARG)
ARG flavor=alpine
FROM maven:3.6-jdk-8-alpine AS builder
…
FROM openjdk:8-jre-$flavor AS release
COPY --from=builder /app/target/my-app-1.0-SNAPSHOT.jar /
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]
The ARG
command can control the image to be built. In the example above, we wrote alpine
as the default flavor, but we can pass --build-arg flavor=<flavor>
on the docker build
command.
time docker build --no-cache --target release --build-arg flavor=jessie .
Concurrency
Concurrency is important when building Docker images as it takes the most advantage of available CPU threads. In a linear Dockerfile, all stages are executed in sequence. With multi-stage builds, we can have smaller dependency stages be ready for the main stage to use them.
BuildKit even brings another performance bonus. If stages are not used later in the build, they are directly skipped instead of processed and discarded when they finish. This means that in the stage graph representation, unneeded stages are not even considered.
Below is an example Dockerfile where a website's assets are built in an assets
stage:
FROM maven:3.6-jdk-8-alpine AS builder
…
FROM tiborvass/whalesay AS assets
RUN whalesay “Hello DockerCon!” > out/assets.html
FROM openjdk:8-jre-alpine AS release
COPY --from=builder /app/my-app-1.0-SNAPSHOT.jar /
COPY --from=assets /out /assets
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]
And here is another Dockerfile where C and C++ libraries are separately compiled and take part in the builder
stage later on.
FROM maven:3.6-jdk-8-alpine AS builder-base
…
FROM gcc:8-alpine AS builder-someClib
…
RUN git clone … ./configure --prefix=/out && make && make install
FROM g++:8-alpine AS builder-some CPPlib
…
RUN git clone … && cmake …
FROM builder-base AS builder
COPY --from=builder-someClib /out /
COPY --from=builder-someCpplib /out /
BuildKit Application Cache
BuildKit has a special feature regarding package managers cache. Here are some examples of cache folders typical locations:
Package manager | Path |
---|---|
apt | /var/lib/apt/lists |
go | ~/.cache/go-build |
go-modules | $GOPATH/pkg/mod |
npm | ~/.npm |
pip | ~/.cache/pip |
We can compare this Dockerfile with the one presented in the section Build from the source in a consistent environment. This earlier Dockerfile didn't have special cache handling. We can do that with a type of mount called cache: --mount=type=cache
.
FROM maven:3.6-jdk-8-alpine AS builder
WORKDIR /app
RUN --mount=target=. --mount=type=cache,target /root/.m2 \
&& mvn package -DoutputDirectory=/
FROM openjdk:8-jre-alpine
COPY --from=builder /app/target/my-app-1.0-SNAPSHOT.jar /
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]
BuildKit Secret Volumes
To mix in some security features of BuildKit, let's see how secret type mounts are used and some cases they are meant for. The first scenario shows an example where we need to hide a secrets file, like ~/.aws/credentials
.
FROM <baseimage>
RUN …
RUN --mount=type=secret,id=aws,target=/root/.aws/credentials,required \
./fetch-assets-from-s3.sh
RUN ./build-scripts.sh
To build this Dockerfile, pass the --secret
argument like this:
docker build --secret id=aws,src=~/.aws/credentials
The second scenario is a method to avoid commands like COPY ./keys/private.pem /root .ssh/private.pem
, as we don't want our SSH keys to be stored on the Docker image after they are no longer needed. BuildKit has an ssh
mount type to cover that:
FROM alpine
RUN apk add --no-cache openssh-client
RUN mkdir -p -m 0700 ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
ARG REPO_REF=19ba7bcd9976ef8a9bd086187df19ba7bcd997f2
RUN --mount=type=ssh,required git clone git@github.com:org/repo /work && cd /work && git checkout -b $REPO_REF
To build this Dockerfile, you need to load your private SSH key into your ssh-agent
and add --ssh=default
, with default
representing the SSH private key location.
eval $(ssh-agent)
ssh-add ~/.ssh/id_rsa # this is the SSH key default location
docker build --ssh=default .
Conclusion
This concludes our demo on using Docker BuildKit to optimize your Dockerfiles and consequentially speed up your images’ build time.
These speed gains result in much-needed savings in time and computational power, which should not be neglected.
Like Charles Duhigg wrote on The Power of Habit: "small victories are the consistent application of a small advantage". You will definitely reap the benefits if you build good practices and habits.
Posted on February 5, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.