Docker Tips And Best Practices

unfor19

Meir Gabay

Posted on February 11, 2021

Docker Tips And Best Practices

Curious about Docker? Eager to strengthen your skills with containers?

In this blog-post, I'll share five (5) tips, tricks, and best practices for using Docker. Let's start with a short analogy for everything that will be covered.

Analogy

The Dockerfile is a recipe for creating Docker images. Hence it should be treated as if it's the recipe for your favorite cake 🍰. It should be concise, readable, and easy to follow; this will make the whole baking (development) process easier.

As part of writing an "easy" recipe (Dockerfile), it's important to enable baking (building) the cake (Docker image) in any kitchen (machine and any UID:GID). After all, if the cake is so good, we'll want to bake (build) the same cake (Docker image) over and over again, anywhere, and speed up 🚀 the baking (build) process over time by memorizing parts of the recipe (layers) in our heads (cache).

It's best to split ✂️ the baking (building) process into steps (multi-stage build), where the final product (Docker image) includes only the relevant contents. We don't want to serve (publish) the cake (Docker image) with a bag of sugar (source-code) or with an oven (build packages/compilers) as it might be embarrassing (and heavy). 🙄

Other than that, keeping the cake (Docker image/container) secured and safe 🔒 from unwanted people or animals 🐈 (hackers) should be taken care of as part of the process of baking the cake (writing a Dockerfile).

And finally, if the cake's recipe (Dockerfile) contains reusable keywords (ARG) such as "double sweet" 🍫 for "2 sugar", and it is used repeatedly in the recipe (Dockerfile), it should be declared once at the top of the recipe (Dockerfile Global ARGs) which will make it possible to use it as a reference ($MY_ARG).

Enough with that.

show-me-the-money

Topics

  1. Order Of Commands
  2. Multi-Stage Build
  3. Run As A Non-Root User
  4. Mind The UID GID
  5. Global ARGs

Order Of Commands

A Docker command (ARG, ENV, RUN, etc.) that is not supposed to be executed when the source-code is changing, should be pushed to the top as much as possible. When comparing to cakes, the base of the cake is the bottom layer, while in a Dockerfile the base of the image is at the top of the file.

The cache of the "requirements packages" should be purged only if a package was added, removed, or its version was changed, but not when something in the code was changed because that happens a lot.

In the following code snippet, the source-code is copied to the image, followed by the installation of requirements (packages). This means that every time one of the source-code files was modified, all the "requirements packages" will be installed. This results in purging the cache of the "requirements packages" on any change in the source-code, which is bad since we want to cache them.

# BAD

# Copy everything from the build context
COPY . /code/

# Install packages - on any change in the source-code
RUN pip install --user -r "requirements.txt"
Enter fullscreen mode Exit fullscreen mode

A good example for caching the requirements layer would be first copying the requirements.txt file, or any other lock-file (package-lock.json, yarn.lock, go.mod, etc.) followed by the installation of the requirements.txt, and only then to copy the source-code.

# GOOD

# Copy and install requirements - only if requirements.txt was changed
COPY requirements.txt /code/
RUN pip install --user -r "requirements.txt"

# Copy everything from the build context
COPY . /code/
Enter fullscreen mode Exit fullscreen mode

Now, there's an "extra" command (COPY) that copies the requirements.txt twice. This might look like a bad thing if you see it for the first time. Its beauty is that it caches the installation of the "requirements packages" and only then copies the source-code. Amazing!

NOTE: Docker will cache commands that haven't affected the file-system during the build process. This is why the order of RUN, WORKDIR, and COPY is crucial.

Multi-Stage Build

Multi-Stage Build enables releasing slim images, including only packages and artifacts the application needs.

Let's investigate the following Dockerfile

# BAD - Not that bad, but it could be better

FROM python:3.9.1-slim

# Upgrade pip and then install build tools
RUN pip install --upgrade pip && \
    pip install --upgrade wheel setuptools wheel check-wheel-contents

# Copy and install requirements - better caching
COPY requirements.txt /code/
RUN pip install --user -r "requirements.txt"

# Copy everything from the build context
COPY . /code/

### Build the application
### COMMANDS ...

ENTRYPOINT ["app"]
Enter fullscreen mode Exit fullscreen mode

A few things about this solution

  1. Includes packages that are used only during the build stage, for example, setup, wheel, and check-wheel-contents
  2. The source code is included in the image
  3. The container will run as the root user; I'll cover it in the next topic

With Multi-Stage Build, it's possible to create an intermediate image, let's call it build, including the source-code and required packages for building. The build stage is followed by the app stage, which is the "final image" that will be published to the Docker registry (DockerHub, ECR, ACR, GCR, etc.) and eventually deployed to the Cloud or On-Premise infrastructure.

Now let's break the above snippet into a Multi-Stage Build pattern.

# GOOD

FROM python:3.9.1-slim as build

# Upgrade pip and then install build tools
RUN pip install --upgrade pip && \
    pip install --upgrade wheel setuptools wheel check-wheel-contents

### Consider the comments as commands
# Copy and install requirements - better caching
# Copy the application from Docker build context to WORKDIR
# Build the application, validate wheel contents and install the application


FROM python:3.9.1-slim as app

WORKDIR /myapp/
COPY --from=build /dist/ /myapp/

ENTRYPOINT ["app"]
Enter fullscreen mode Exit fullscreen mode

In general, the last FROM command in a Dockerfile indicates that this is the final image. This is how we know to name it app (or prod) and make sure that it contains only the relevant contents. I called it app even though it's not used anywhere else in the code; this is just for brevity and better documentation.

NOTE: If you're curious why I didn't need to install anything in the final image, it's because the build process includes all the packages in the /dist/lib directory. This is by design, and I totally recommend adopting this practice.

Run As A Non-Root User

The code snippets above didn't mention anything about which user is running the commands. The default user is root, so all the commands to build the application are executed with superuser permissions, which is okay since this stage is done behind the scenes. What troubles me is - why should I allow the user to run the application (container) to execute everything as a superuser (root)?

Picture this - your application is running in the cloud, and you haven't followed the principle of least privilege.

John, the nifty hacker, was able to hack into your application. Do you realize that John can execute apt-get install ANYTHING? If John is really good at what he's doing, he can access any back-end service exposed to your application. Let's take some "negligible" service, such as your database, where John can install mysql and communicate with your database.

docker-tips-and-best-practices-john-happy

To solve this problem, you can use the USER command in the Dockerfile to switch the user from root to some appuser whose sole purpose (and permission) is to execute the application, nothing more.

Omitting the build stage, let's focus on the app stage

# GOOD

FROM python:3.9.1-slim as app

WORKDIR /myapp/

# Creates `appuser` and `appgroup` and sets permissions on the app`s directory
RUN addgroup appgroup --gid 1000 && \
    useradd appuser --uid 1000 --gid appgroup --home-dir /myapp/ && \
    chown -R appuser:appgroup /myapp/


# All the following commands will be executed by `appuser`, instead of `root`
USER appuser

# Copy artifacts from the build stage and set `appuser` as the owner
COPY --from=build --chown=appuser:appgroup /myapp/

ENTRYPOINT ["app"]
Enter fullscreen mode Exit fullscreen mode

Back to John, the nifty hacker; John tries to execute apt-get install ANYTHING, and fails, since apt-get requires super-user permissions. John tries to write malicious code in /root/ and gets permission denied because this directory's permissions set is 700 - read, write and execute by the owner (root:) or group (:root) and nothing more.

docker-tips-and-best-practices-john-frustrated

I'm sure that if John is very talented, he'll still be able to do some harm, but still, it's best to minimize the collateral damage and isolate applications as much as possible. We also don't want John to laugh about the fact that we could've prevented him from using apt-get install ANYTHING, and we simply didn't do it.

Mind The UID GID

As you can see in the code snippet above, I used --uid 1000 and --gid 1000. The values 1000:1000 are the default values for creating a new user or group in Ubuntu, and I used 1000:1000 because I'm on WSL2 Ubuntu:20.04, so I could've just omitted those arguments. Here's how my user looks like

$ cat /etc/passwd | grep "$(whoami)"
myuser:x:1000:1000:,,,:/home/myuser:/bin/bash
Enter fullscreen mode Exit fullscreen mode

If the numbers are not the same as those on your machine, then adjusting them with --uid UID and --gid GID will ease the development process. Sounds interesting, right? ...

put-their-names-to-the-test

I'll use a real containerized Python application; here's the Dockerfile of unfor19/frigga/Dockerfile (yes, yes, I wrote it). Imagine that I hadn't used the USER command in the Dockerfile; let's imagine it together by enforcing the container to run as root with docker run --user=root ...

# BAD
# Reminder - My machine's UID:GID is 1000:1000

# root UID:GID is 0:0

$ docker run --rm -it -v $PWD/:/code/ --user=root --workdir=/code/ --entrypoint=bash unfor19/frigga

root@987c5784a52e:/code# cat /etc/passwd | grep "$(whoami)"
root:x:0:0:root:/root:/bin/bash
# UID:GID = 0:0

root@987c5784a52e:/code# echo "root contents" > root-file.txt
root@987c5784a52e:/code# ls -lh root-file.txt
# -rw-r--r-- 1 root root 14 Feb 12 14:03 root-file.txt
root@987c5784a52e:/code# exit

# Local machine
$ ls -lh root-file.txt 
# -rw-r--r-- 1 root root 14 Feb 12 14:04 root-file.txt

$ echo "more contents" >> root-file.txt
# bash: root-file.txt: Permission denied
Enter fullscreen mode Exit fullscreen mode

The above could be resolved by adding sudo before the echo command.

$ sudo echo "more contents" >> root-file.txt
# success
Enter fullscreen mode Exit fullscreen mode

But do we really want to use sudo for editing files? What about our IDE? Do we need to run it with sudo to edit files? I hope not. A better approach would be adjusting the application's (container) UID:GID according to the local machine's UID:GID. In my case, I didn't have to use --uid and --gid in the Dockerfile, since I'm using the same IDs as my application (container) uses.

# GOOD
# Reminder - My machine's UID:GID is 1000:1000

# frigga's user UID:GID - 1000:1000

$ docker run --rm -it -v $PWD/:/code/ --workdir=/code/ --entrypoint=bash unfor19/frigga

appuser@52ad885a9ad5:/code$ echo "file contents" > some-file.txt
appuser@52ad885a9ad5:/code$ ls -lh some-file.txt
# -rw-r--r-- 1 appuser appgroup 28 Feb 12 14:15 some-file.txt
appuser@52ad885a9ad5:/code$ exit

# Local machine
$ ls -lh some-file.txt 
# -rw-r--r-- 1 meir meir 14 Feb 12 14:16 some-file.txt

$ echo "more contents" >> some-file.txt
# success
Enter fullscreen mode Exit fullscreen mode

The file some-file.txt is set with the following permissions rw-r-r (644), so only the file owner can edit this file. Luckily (or is it?), my UID and GID are also 1000, so I'm able to edit the file with my current user, without adding sudo every time.

Global ARGs

Going back to the Dockerfile - it's possible to declare global ARGs and pass them along to the Stages. This helps with following the Don't Repeat Yourself (DRY) principle. For example, providing the PYTHON_VERSION as a global argument, instead of hardcoding it for each Stage is superb! Let's see it in action.

# BAD - 3.9.1 is hardcoded

FROM python:3.9.1-slim as build
# Build stage commands

FROM python:3.9.1-slim as app
# App stage commands
ENTRYPOINT ["app"]
Enter fullscreen mode Exit fullscreen mode

Consider this instead-

# GOOD - 3.9.1 is declared once at the top of the file
ARG PYTHON_VERSION="3.9.1"

FROM python:"$PYTHON_VERSION"-slim as build
# Build stage commands

FROM python:"$PYTHON_VERSION"-slim as app
# App stage commands
ENTRYPOINT ["app"]
Enter fullscreen mode Exit fullscreen mode

Final Words

If you are here, then it means you're really into it. See a full example of a containerized Python application, essentially a CLI, see unfor19/frigga. I've implemented all the best practices I could think of in this project, and to take it even further check the GitHub Actions (CI/CD) of this project, I added a fully blown test-suite to make sure that frigga can run on both docker-compose and Kubernetes, so you might find it handy.

That would be all. Feel free to ask questions or leave a comment with your best practices for using Docker.

thank-you

💖 💪 🙅 🚩
unfor19
Meir Gabay

Posted on February 11, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Docker Tips And Best Practices
docker Docker Tips And Best Practices

February 11, 2021