Bundling Python Environments in a ZIP Archive

jhermann

Jürgen Hermann

Posted on March 12, 2020

Bundling Python Environments in a ZIP Archive

Shipping dependencies for your scripts as a single file, built with ‘shiv’.

The Basic Idea

If you have a set of Python scripts that are all using the same set of required packages, you can distribute those dependencies in the form of a zipapp, i.e. in a single executable file. See Building Zipapps (PEP 441) for details if you're new to the concept of zipped Python application bundles

Unlike shipping a script in a virtualenv built within a single project, you can have a project for the base libraries and other projects for the scripts, including scripts written by end users who are just using your dependencies.

You can also deploy any PyPI package that way, with a simple call of shiv, as shown in the next section using Pandas.

A Practical Example

The following example uses the well-known Pandas data science library, but this works for any project built with setuptools or any other build tool creating Python packages that declare their requirements.

So, to create your base library release artifact, install and call shiv like this:

python3.8 -m pip install --user shiv
python3.8 -m shiv -p '/usr/bin/python3.8 -IS' \
                  -o ~/bin/_lib-pandas pandas==1.0.1

Do this in a virtualenv and leave out the --user option if you want to keep your account's home directory clean.

Note that we do not provide an entry point here, which means this zipapp drops into the given Python interpreter and is thus usable as an interpreter, with the contained packages available for import.

Now we can exploit this to write a script using the zipapp as its interpreter:

cat >script <<'EOF'
#! /usr/bin/env _lib-pandas
import re
import sys
from pathlib import Path
import pandas as pd

print('Using Pandas from',
      Path(pd. __file__ ).parent.relative_to(Path.home()),
      '\n\nPython path:')
df = pd.DataFrame(sys.path, columns=['Path'])
df.Path = df.Path.str.replace(f'^{ re.escape(str(Path.home())) }/', '~/')
print(df)
EOF
chmod +x script
./script

Calling the script produces the following output:

Using Pandas from .shiv/_lib-pandas_23b2…d2/site-packages/pandas 

Python path:
                                                Path
0 ~/bin/_lib-pandas
1 /usr/lib/python38.zip
2 /usr/lib/python3.8
3 /usr/lib/python3.8/lib-dynload
4 ~/.shiv/_lib-pandas_23b2bb7d64c26139950435a64d...

If you're familiar with Pandas, you'll instantly recognize the Python path output as coming from a Pandas data frame. 🎉

This first execution is a bit slow on startup, because the cache directory you see at the end of the Python path has to be populated first. shiv's boot-strapping code unpacks extension packages containing native code into the file system, so the OS can load them.

The underscore prefix in the zipapp name indicates this is not a command humans would normally use. Alternatively and especially in production you can deploy into e.g. /usr/local/lib/python3.8/ and then use an absolute path instead of an env call as the script's interpreter.

💖 💪 🙅 🚩
jhermann
Jürgen Hermann

Posted on March 12, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related