Kees C. Bakker
Posted on August 31, 2022
I have a love and hate relationship with Python; it is super easy to develop in Python (abundance of examples and packages, cross platform), but the setup of your development environment (Python versions, package versions) is too cumbersome. This increases when you need to share that setup with another developer on another platform. I work on Windows, my colleagues on Mac and Linux.
That's why I now use Visual Studio Code and "dev containers":
The Visual Studio Code Remote - Containers extension lets you use a Docker container as a full-featured development environment. It allows you to open any folder inside (or mounted into) a container and take advantage of Visual Studio Code's full feature set.
Source: Developing inside a Container
I have some reasons for switching to this setup:
- Visual Studio Code is already my go tool for many languages (Node.js, bash, Terraform). Many of my colleagues also use it for various tasks.
- Running the whole dev setup in Docker helps keeping the host system clean. It would not be the first time that someone "bricked" their Python setup due to a new version requirement.
- Docker gives you a nice, replicable, cross platform setup: on each dev machine it should work the same.
Now, the main question is: will this work for Jupyter Notebooks? Visual Studio Code already provides an excellent UI, but will it run Jupyter in a development container? Let's find out! My use case is a scraper, so I need support for Puppeteer as well.
- Default Python Dev Container Setup
- Install IPython & Pandas in the Dev Container
- How about Puppeteer?
- Don't commit Jupyter Notebook output
- Improving performance
- VSCode extensions
- Final thoughts
- Changelog
Default Python Dev Container Setup
Let's start with a Python 3 development container:
- Open the command palette (F1 on Windows)
- Search for:
Remote-Containers: Add Development Container Configuration files...
This will open up a small wizard. - Select
Python 3
as the language. - Select
3
as the version (it will add the latest anyway). - We don't need Node.js, so let's select
None
(can be enabled later, but this option will make your container build faster).
The setup generates the configuration files in the .devcontainer
folder of your project.
Install IPython & Pandas in the Dev Container
Now, open the Dockerfile
from the .devcontainer
folder and uncomment the following lines:
# [Optional] If your pip requirements rarely change, uncomment this section to add them to the image.
COPY requirements.txt /tmp/pip-tmp/
RUN pip3 --disable-pip-version-check --no-cache-dir install -r /tmp/pip-tmp/requirements.txt \
&& rm -rf /tmp/pip-tmp
This makes it possible to start using a requirements.txt
file, to install packages with PIP on a container level.
Next, add the requirements.txt
to the root of your project and enter the following lines:
ipython
ipykernel
pandas
This makes sure the container installs the packages you need for your notebook.
Do I still need to install packages in my notebook?
Yo don't have too, but it might make things easier if you do and it caters to people that don't use your setup. With this simple line, you will install all the packages specified in your requirements.txt
file:
%pip install --quiet --exists-action i --disable-pip-version-check -r ../requirements.txt --user
It will complete fast, as all things are already installed in your dev container. This also makes adding new packages easier, as you don't have to restart your dev container.
How about Puppeteer?
We're going to use the Python version of Puppeteer called Pyppeteer. Running Puppeteer from a container is not straightforward. Let's airlift the code from this article into our setup.
Add the following lines to the Dockerfile
:
# Install Google Chrome Stable and fonts
# Note: this installs the necessary libs to make the browser work with Puppeteer.
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
RUN apt-get update && export DEBIAN_FRONTEND=noninteractive && apt-get install gnupg wget -y && \
wget --quiet --output-document=- https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/google-archive.gpg && \
sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' && \
apt-get update && \
apt-get install google-chrome-stable -y --no-install-recommends && \
rm -rf /var/lib/apt/lists/*
This will download and install Chrome to the container. Now we just need to add the Puppeteer packages to our requirements.txt
file:
ipython
ipykernel
pandas
pyppeteer
Add the following line to your notebook:
%pip install --quiet --exists-action i --disable-pip-version-check pyppeteer
Now, when we launch the browser in the notebook, we only have to test if we're in our development container:
browser_options = {
'headless': True,
'args': ["--no-sandbox"]
}
if os.getenv('PUPPETEER_SKIP_CHROMIUM_DOWNLOAD', '') != '':
browser_options["executablePath"] = '/usr/bin/google-chrome'
browser = await launch(browser_options)
This makes sure that the pre-installed version of Chrome is used in your container.
Don't commit Jupyter Notebook output
We want to improve what we commit to Git: let's not commit the output of the notebook, by implementing the code of this article. First, let's add the nbconvert package to your requirements.txt
:
ipython
ipykernel
nbconvert
pandas
pyppeteer
pyppeteer_stealth
Let's configure Git so it can use it; add a .gitconfig
file to the root of your project:
[filter "strip-notebook-output"]
clean = "jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR"
Now add a file called .gitattributes
:
*.ipynb filter=strip-notebook-output
The last file you'll need to add is repo_init.sh
:
#!/usr/bin/env bash
git config --local include.path ../.gitconfig
git add --renormalize .
Now, start your project in the dev container and hook things up with bash repo_init.sh
. Every time a user clones your repository for the first time, this script needs to be executed.
Now Git is able to understand what has changed, and will no longer show a change when you just ran the script. Note: VSCode will still think the notebook has changed, but Git itself will not commit the change.
Auto save?
When you run the notebook, the file is changed. Notebooks store both the code and the output. I find it very annoying that VSCode shows an unsaved file in my IDE (and tries to restore it if I close the editor). To mitigate this, you can enable auto save to your dev container settings:
- Open up
.devcontainer/devcontainer.json
- Navigate to
customizations
>vscode
>settings
- Add:
"files.autoSave": "afterDelay",
- Add
"files.autoSaveDelay": 1000
Pet peeve fixed.
Improving performance
Our setup currently is naïve, as it does not fully leverage Docker caching. Let's look at our final Dockerfile
:
# See here for image contents: https://github.com/microsoft/vscode-dev-containers/tree/v0.245.0/containers/python-3/.devcontainer/base.Dockerfile
# [Choice] Python version (use -bullseye variants on local arm64/Apple Silicon): 3, 3.10, 3.9, 3.8, 3.7, 3.6, 3-bullseye, 3.10-bullseye, 3.9-bullseye, 3.8-bullseye, 3.7-bullseye, 3.6-bullseye, 3-buster, 3.10-buster, 3.9-buster, 3.8-buster, 3.7-buster, 3.6-buster
ARG VARIANT="3.10-bullseye"
FROM mcr.microsoft.com/vscode/devcontainers/python:0-${VARIANT}
# [Choice] Node.js version: none, lts/*, 16, 14, 12, 10
ARG NODE_VERSION="none"
RUN if [ "${NODE_VERSION}" != "none" ]; then su vscode -c "umask 0002 && . /usr/local/share/nvm/nvm.sh && nvm install ${NODE_VERSION} 2>&1"; fi
# Install Google Chrome Stable and fonts
# Note: this installs the necessary libs to make the browser work with Puppeteer.
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
RUN apt-get update && apt-get install gnupg wget -y && \
wget --quiet --output-document=- https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/google-archive.gpg && \
sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' && \
apt-get update && \
apt-get install google-chrome-stable -y --no-install-recommends && \
rm -rf /var/lib/apt/lists/*
# [Optional] If your pip requirements rarely change, uncomment this section to add them to the image.
COPY requirements.txt /tmp/pip-tmp/
RUN pip3 --disable-pip-version-check --no-cache-dir install -r /tmp/pip-tmp/requirements.txt \
&& rm -rf /tmp/pip-tmp
By swapping the installation of the packages, you don't need to reinstall Chrome every time your Python packages changes.
VSCode extensions
The nice thing about this setup is the ability to share your Visual Studio Code Extensions. They are stored in the devcontainer.json
, just like the settings. I'm using these extensions:
- Jupyter
- Jupyter Keymap
- Jupyter Notebook Renderers
- vscode-icons - to provide better icons in your project explorer
- TODO Highlight - to provide a clearer highlighting of stuff you still need to do
- ShellCheck - a linter for writing bash scripts - somehow I always end up with writing scripts in my projects.
I replaced the customizations
> vscode
> extensions
node with this:
"extensions": [
"ms-python.python",
"ms-python.vscode-pylance",
"ms-toolsai.jupyter",
"ms-toolsai.jupyter-keymap",
"ms-toolsai.jupyter-renderers",
"vscode-icons-team.vscode-icons",
"wayou.vscode-todo-highlight",
"timonwong.shellcheck"
]
When somebody opens the projects it is notified about the extensions.
Final thoughts
I like the fact that this setup works on any machine. Consider locking your versions in, as both Chrome and your packages are not versioned (so you might download breaking changes).
I don't like the fact that Visual Studio Code thinks a notebook has changed, while Git knows it isn't. According to issue #9514, this is something that should be fixed in the core of Visual Studio Code. So, I'm not really sure why issue #83232 (or #24883) is closed.
Changelog
2022-08-09 Changed the %pip install
line to use the requirements.txt
file.
Posted on August 31, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.