Introducing PipFlow
✨iMichael✨
Posted on January 11, 2020
I recently published a new Python package manager called PipFlow. If you're like me and use Docker for everything (local development + containers in production), you might like it.
Let's take a step back and dive into what problem it solves and at the end I'll tell you why the newer package managers were not a good fit for me. Here are my priors:
- I believe in making my local development mirror production NOT make production mirror my local development.
- Therefore I do everything in Docker (and k8s)
- In Docker, the slimmer the image the better. Also if you can avoid redundant dependencies and build steps, you should.
With that out of the way, let's add some new packages. Requests is cool –– let's install it the old way.
pip install requests
I now have this package in my host operating system, but it's useless there because I use docker locally. So let's get the package and add it to my requirements.txt
file.
pip freeze | grep requests >> requirements.txt
My requirements now has requests==2.22.0
in it. Okay cool, but now we need to rebuild our docker image to bake in the new dependency
docker-compose build app
Everything works great now, but let's take a moment to identify some waste: I just installed a package twice to use it. There must be a better way, right? There is:
pipflow add requests
The command above gets the latest requests version and adds it to our requirements file, sorts the file, and rebuilds the docker image – which is the only installation step that matters. The great thing is that we don't need to install pipflow in our Dockerfile, just once in our host operating system.
Pipflow is really a "pip workflow" that just simplifies a few steps into one step. Inspired by the yarn command line API, pipflow also does upgrades for single packages pipflow upgrade <package>
and removes packages with pipflow remove <package>
. You can also upgrade all packages with pipflow upgrade-all
.
So why don't I use one of these newer tools like Poetry or Pipenv? The answer is that these tools solve problems that don't exist in Docker.
Lets read the pipenv README:
You no longer need to use pip and virtualenv separately. They work together
Virtual Environments are not necessary in Docker – Docker already has isolation and sandboxing. Next time you do ADD venv .
in Docker, ask yourself what problem you're solving.
What about Poetry and other tools that use lock files, those are useful aren't they?
Nothing is gained by using a secondary lock file in Docker. Requirements txt files are perfectly fine at pinning versions (requirements.txt is your manifest + lockfile already). In fact there is a loss: You have to generate the lock file AND/OR you have to install the tool in your Docker image. Both are wasteful, redundant build steps.
Another thing to remember is that when you do RUN pip install -r requirements.txt
in Docker, a new image layer is created – this layer represents all the new packages added to the file system and each layer gets a unique sha256 digest (in effect a lock file mechanism). If you build again and nothing changed in your requirements, the layer is cached. It's a beautiful thing.
Here's what my Dockerfile looks like in a production project that uses pipflow:
FROM python:3.8.0-alpine
EXPOSE 8000
WORKDIR /app
RUN addgroup -S app && adduser -S -G app app
ADD requirements.txt .
RUN pip install -r requirements.txt
ADD . .
USER app
ENTRYPOINT ["scripts/entrypoint.sh"]
You can find the repo on Github and the PyPi package here or just install it with pip install pipflow
. The only assumptions are: your Dockerfile/image has pip
installed and you have these two lines in your Dockerfile (ideally before you add the rest of the source code for optimal caching).
ADD requirements.txt .
RUN pip install -r requirements.txt
Let me know if you love it, hate it, or ¯\_(ツ)_/¯
. It's a young project, so expect a few kinks 😃.
Posted on January 11, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.