Akshay Khot
Posted on April 18, 2022
Docker builds images using a Dockerfile that contains all the instructions needed to create an image. Each instruction maps to a read-only layer stacked on top of the previous layer and is a delta of the changes from the previous layer.
This caching mechanism improves the performance and efficiency when building images. However, sometimes it can cause unintended issues. This post explains few such cases.
1. Installing Packages from Cached Repositories
Let's say your Dockerfile has the following commands:
RUN apt-get update
RUN apt-get install sqlite3
After a month; you want to add nginx
to the image and update the Dockerfile to:
RUN apt-get update
RUN apt-get install sqlite3 nginx
Since a Dockerfile lists instructions in a sequence, each instruction builds on top of the previous instruction. When Docker builds the image, it caches each step in the Dockerfile. The cache is invalidated when you change Dockerfile, or the filesystem is changed. Hence Docker only needs to rebuild from the first instruction in the Dockerfile where there was a change.
Docker only needs to rebuild from the first instruction in the Dockerfile where there was a change.
In the above example, the apt-get update
instruction didn't change. Hence Docker will use the old package repository which it cached the last time you built the image and install an older version of nginx
.
To prevent this, always combine the apt-get update
and apt-get install
commands into a single instruction.
RUN apt-get update && apt-get install sqlite3 nginx
This ensures that you'll force Docker to get the latest package repository whenever you add new packages.
2. Installing Gems When Not Needed
Let's assume you fixed the above issue, and now your Dockerfile looks like this:
FROM ruby
RUN apt-get update && apt-get install sqlite3 nginx
COPY . /app/
WORKDIR /app
RUN bundle install
Now edit any file in the project, say routes.rb
, and rebuild the image. You'll notice that even though you didn't touch the Gemfile
, Docker re-installed all the gems again.
The culprit is the third instruction COPY . /app/
that copies the contents of your current directory to the container. After you edited the routes.rb
file, the Docker daemon compared the files getting copied to the previously copied files. Since they were different, Docker invalidated the cache.
As mentioned earlier, once an instruction's cache is invalidated, all caches for future instructions are also invalidated. Hence Docker re-installs all the gems, as it can't guarantee that the filesystem change didn't add/remove a gem from the Gemfile
. However, this is slow and redundant. Changing any random file in your project shouldn't trigger bundle install
.
We can solve this problem by separating the files that trigger bundle install
from those that don't.
FROM ruby
RUN apt-get update && apt-get install sqlite3 nginx
# copy Gemfile and Gemfile.lock
COPY Gemfile* /app/
WORKDIR /app
RUN bundle install
# copy remaining files
COPY . /app/
The separation of the Gemfile
copy instruction creates a separate layer that is cached independently. Hence its cache is not invalidated when you later change something in the routes.rb
file.
The gems will be reinstalled only if you change something in the Gemfile
. Changes to all other files will only invalidate the cache of the last instructions, which is fine, as Docker has already installed the gems at this point.
If you change something and rebuild the image, it will now build quickly as it doesn't have to re-install the gems. Neat.
Hope that helps.
Posted on April 18, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.