Introducing PipFlow

I recently published a new Python package manager called PipFlow. If you're like me and use Docker for everything (local development + containers in production), you might like it.

Let's take a step back and dive into what problem it solves and at the end I'll tell you why the newer package managers were not a good fit for me. Here are my priors:

I believe in making my local development mirror production NOT make production mirror my local development.
Therefore I do everything in Docker (and k8s)
In Docker, the slimmer the image the better. Also if you can avoid redundant dependencies and build steps, you should.

With that out of the way, let's add some new packages. Requests is cool –– let's install it the old way.

pip install requests

I now have this package in my host operating system, but it's useless there because I use docker locally. So let's get the package and add it to my requirements.txt file.

pip freeze | grep requests >> requirements.txt

My requirements now has requests==2.22.0 in it. Okay cool, but now we need to rebuild our docker image to bake in the new dependency

docker-compose build app

Everything works great now, but let's take a moment to identify some waste: I just installed a package twice to use it. There must be a better way, right? There is:

pipflow add requests

The command above gets the latest requests version and adds it to our requirements file, sorts the file, and rebuilds the docker image – which is the only installation step that matters. The great thing is that we don't need to install pipflow in our Dockerfile, just once in our host operating system.

Pipflow is really a "pip workflow" that just simplifies a few steps into one step. Inspired by the yarn command line API, pipflow also does upgrades for single packages pipflow upgrade <package> and removes packages with pipflow remove <package>. You can also upgrade all packages with pipflow upgrade-all.

So why don't I use one of these newer tools like Poetry or Pipenv? The answer is that these tools solve problems that don't exist in Docker.

Lets read the pipenv README:

You no longer need to use pip and virtualenv separately. They work together

Virtual Environments are not necessary in Docker – Docker already has isolation and sandboxing. Next time you do ADD venv . in Docker, ask yourself what problem you're solving.

What about Poetry and other tools that use lock files, those are useful aren't they?

Nothing is gained by using a secondary lock file in Docker. Requirements txt files are perfectly fine at pinning versions (requirements.txt is your manifest + lockfile already). In fact there is a loss: You have to generate the lock file AND/OR you have to install the tool in your Docker image. Both are wasteful, redundant build steps.

Another thing to remember is that when you do RUN pip install -r requirements.txt in Docker, a new image layer is created – this layer represents all the new packages added to the file system and each layer gets a unique sha256 digest (in effect a lock file mechanism). If you build again and nothing changed in your requirements, the layer is cached. It's a beautiful thing.

Here's what my Dockerfile looks like in a production project that uses pipflow:

FROM python:3.8.0-alpine

EXPOSE 8000

WORKDIR /app

RUN addgroup -S app && adduser -S -G app app

ADD requirements.txt .

RUN pip install -r requirements.txt

ADD . .

USER app

ENTRYPOINT ["scripts/entrypoint.sh"]

You can find the repo on Github and the PyPi package here or just install it with pip install pipflow. The only assumptions are: your Dockerfile/image has pip installed and you have these two lines in your Dockerfile (ideally before you add the rest of the source code for optimal caching).

ADD requirements.txt .

RUN pip install -r requirements.txt

Let me know if you love it, hate it, or ¯\_(ツ)_/¯. It's a young project, so expect a few kinks 😃.

Blog

✨iMichael✨

Join Our Newsletter. No Spam, Only the good stuff.

Related