When checking your Python package sources matters
Fridolín Pokorný
Posted on April 16, 2023
In today's article, we will take a look at a small tool called Yorkshire. It's goal is to check configured Python package indexes in projects to make sure only desired package sources are used.
A cute Yorkshire terrier, Pixabay.
Python's packaging allows users to consume packages from multiple sources - multiple Python package indexes. During an installation process, the resolution algorithm implemented in pip searches all the configured package indexes to satisfy requirements.
The resolution algorithm treats all the configured indexes as mirrors. If a package foo
is available on an index A as well as on an index B, they are both treated with the same relevance, considering versions available. Options --index-url
and --extra-index-url
allow specifying the primary and secondary indexes, but there is no guarantee on which index is actually used. If there is a network issue, the resolution process can use secondary indexes as they are just mirrors.
In some cases, users want to consume packages from index A and a specific package from index B. As of today, there is no configuration option in pip to specify which index should be used to consume the specific package. This allows dependency confusion attacks, such as the PyTorch incident.
There was a discussion to prevent dependency confusion attacks using a map file on discuss.python.org. The idea of the map file was not accepted, nevertheless there was a proposal PEP-708: Extending the Repository API to Mitigate Dependency Confusion Attacks that pushed the idea of preventing the dependency confusion attacks further.
Until the PEP-708 gets eventually accepted and implemented, there is a space to check how projects configure their Python package indexes. Even if PEP-708 is accepted, it might be a good idea for organizations to check which indexes are used to monitor consumption of software in their environments.
To support checks of the index configuration, there was developed a tool called Yorkshire. Yorkshire checks any index configuration in files that can be used to specify project dependencies, such as requirements.txt
, pyproject.toml
, or Pipenv files. If there are used multiple Python package indexes, Yorkshire reports it. Optionally, it can check only allowed indexes are configured.
Let's take Poetry's configuration for specifying secondary indexes as an example. The linked command can generate a pyproject.toml
file similar to this one:
[tool.poetry]
name = "foo"
version = "1.0.0"
description = "My package"
authors = ["Author <author@email.com>"]
[tool.poetry.dependencies]
python = "^3.6"
flask = "^2"
[[tool.poetry.source]]
name = "private_repo"
url = "https://test.pypi.org/simple/"
default = false
secondary = true
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
The configuration above allows Poetry to consume packages hosted on PyPI as well as the ones hosted on the test PyPI. Which one will be actually used to consume packages? Well, it depends on dependencies of the flask
package. Note the version might be relevant here as well, depending on the environment to which dependencies are installed.
Let's run Yorkshire on the pyproject.toml
file above:
$ yorkshire detect ./pyproject.toml
2023-04-15 20:13:44,984 [1767887] INFO yorkshire._lib: Performing detection in pyproject.toml file located at '.'
2023-04-15 20:13:44,985 [1767887] WARNING yorkshire._lib: File './pyproject.toml' uses an explicitly configured Poetry source: ['https://test.pypi.org/simple/']
As can be seen, Yorkshire issues a warning as multiple package sources can be eventually used.
Next, let's specify the test PyPI to be allowed:
$ yorkshire detect --allowed-index-url "https://test.pypi.org/simple/" ./pyproject.toml
2023-04-15 20:16:33,806 [1773955] INFO yorkshire._lib: Performing detection in pyproject.toml file located at '.'
The command above shows that the test PyPI specified in the pyprojec.toml
is now no longer flagged as a possible issue.
Yorkshire understands requirements file types used in the Python ecosystem. Similarly to the pyproject.toml
configuration specific to Poetry, Yorkshire supports configuration as used in PDM, Pipenv, pip-tools, or pip itself. All the tools have their own specifics (some of them even support assigning packages to an index, as mentioned above).
Yorkshire provides an API to eventually incorporate checks into other projects or systems.
Organizations can use Yorkshire in their checks or monitoring to make sure only trusted Python package indexes are used. Would you find it useful?
Posted on April 16, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 22, 2024