Itamar Turner-Trauring
Posted on March 27, 2020
When it's time to package up your Python application into a Docker image, the natural thing to do is search the web for some examples. And a quick search will provide you with plenty of simple, easy examples.
Unfortunately, these simple, easy examples are often broken in a variety of ways, some obvious, some less so. To demonstrate just some of the ways they're broken, I'm going to:
- Start with an example
Dockerfile
that comes up fairly high on some Google searches. - Show how it's broken.
- Give some suggestions on how to make it less broken.
Broken by default
Consider the following Dockerfile
, which I found by searching for Python Dockerization examples. I've made some minor changes to disguise its origin, but otherwise it is the same:
# DO NOT USE THIS DOCKERFILE AS AN EXAMPLE, IT IS BROKEN
FROM python:3
COPY yourscript.py /
RUN pip install flask
CMD [ "python", "./yourscript.py" ]
Some of the problems with this Dockerfile
How many different problems can you spot in this image?
Problem #1: Non-reproducible builds re Python version
The first thing to notice is that this Dockerfile
is based off of the python:3
image. At the time of writing this will install Python 3.7, but at some point it will switch to installing Python 3.8.
At that point rebuilding the image will switch to a different version of Python, which might break the software: a minor change in your code can lead to a deploy that breaks production.
Solution: Use python:3.7.3-stretch
as the base image, to pin the version and OS. Or, python:3.7-stretch
if you're feeling less worried about point releases. See my article for choosing a base image for Python for more details on image variants.
Problem #2: Non-reproducible builds re dependencies.
Similarly, flask
is installed with no versioning, so each time the image is rebuilt potentially a new version of flask
(or one of its dependencies, or one of its dependencies' dependencies) will change. If they're compatible, great, but there's no guarantee that is the case.
Solution: Create requirements.txt
with transitively-pinned versions of all dependencies, e.g. by using pip-tools
, poetry
, or Pipenv
.
Problem #3: Changes to source code invalidate the build cache
If you want fast builds, you want to rely on Docker's layer caching. But by copying in the file before running pip install
, all later layers are invalidated—this image will be rebuilt from scratch every time.
Solution: Copy in files only when they're first needed.
Problem #4: Running as root, which is insecure
By default Docker containers run as root, which is a security risk.
Solution: It's much better to run as a non-root user, and do so in the image itself so that you don't listen on ports<1024 or do other operations that require a subset of root's permissions.
A somewhat better image
Here's a somewhat better—though still not ideal—Dockerfile that addresses the issues above:
FROM python:3.7.3-stretch
COPY requirements.txt /tmp/
RUN pip install -r /tmp/requirements.txt
RUN useradd --create-home appuser
WORKDIR /home/appuser
USER appuser
COPY yourscript.py .
CMD [ "python", "./yourscript.py" ]
Even if the resulting image was something you'd want to run in production—and it almost certainly isn't!—the image is still insufficient on its own.
For example, you also need to regularly update requirements.txt
in a controlled manner, in order to get security updates and bug fixes, and you'll need to regularly rebuild your images without caching to get security updates.
And there are many more improvements you could make to get this closer to a production-ready Python container.
Be careful what you learn from
A broken Docker image can lead to production outages, and building best-practices images is a lot harder than it seems. So don't just copy the first example you find on the web: do your research, and spend some time reading about best practices.
Too much to learn, and don't know where to start?
Learn practical software engineering techniques every week by signing up for my newsletter.
Posted on March 27, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.