Discovering Python Namespace Packages
Bastien ANTOINE
Posted on January 9, 2022
Note: this article was originally published on my personal blog. Check it out here.
In a recent post about the __path__
attribute, I explained that it would be possible to manipulate a library's path to dynamically extend its functionalities. At the end of my article I detailed that this possibility has already been thought of was integrated in the import system since Python 3.3 with PEP 420, under the name of namespace package.
Today I'd like to go a bit more in details about this.
- What is a package in Python?
- The specificity of the namespace packages
- Step by step basic example
- Dynamic loading of package parts
- Limitations
- Possible usage
- More concrete example
What is a package in Python?
The Python glossary defines a package as:
A Python module which can contain submodules or recursively, subpackages. Technically, a package is a Python module with a
__path__
attribute.
So a package in Python is basically a set of .py
files (modules) organized in subdirectories (subpackages).
The __path__
attribute is specific to the packages. It is used by the import system when trying to import modules of subpackages. Basically its function is the same as sys.path
, but limited to the scope of a package. I'll let you check out my article on the __path__
attribute about it for more information.
The specificity of the namespace packages
A namespace package is a special kind of Python package: among other differences with the regular packages, the most important and notable difference is that its source files can be splited across multiple locations.
This means that the following structure can be considered a namespace package, named parent
(I'll explain later on how this can be used):
/home/myUser/dev/
└── package/
└── parent/
└── child/
└── foo.py
/var/usr/lib
└── package/
└── parent/
└── child/
└── bar.py
Now with the following example, I would have two distinct packages, both named parent
, one in /home/myUser/dev/
and another in /var/usr/lib
1.
/home/myUser/dev/
└── package/
└── parent/
├── __init__.py
└── child/
├── __init__.py
└── foo.py
/var/usr/lib
└── package/
└── parent/
├── __init__.py
└── child/
├── __init__.py
└── bar.py
Differences with the regular packages
Like I already mentioned, the most notable difference between regular and namespace packages is that the latter can be splitted across multiple locations.
This ability to split the sources also impacts the package's __path__
attribute: instead of being a list with a single element being the path of the package's __init__.py
, it is an iterable of paths, containing all the locations of the package's sources.
Keen eyes might have noticed also that I ommitted the __init__.py
files in the structure of the namespace package example above. That's not a mistake, actually namespace package are recognized and detected by the Python import system by their lack of __init__.py
files.
Step by step basic example
Take the following structure as example:
/home/myUser/dev/
└── packageExtension/
└── parent/
└── child/
└── bar.py
/var/usr/lib
└── mainPackage/
└── parent/
└── child/
└── foo.py
To make it work as a namespace package, we first need to make sure that both sources can be found in sys.path
. For the sake of the examples we'll add them by hand in it:
>>> import sys
>>> sys.path.extend([
'/var/usr/lib/mainPackage',
'/home/myUser/dev/packageExtension',
])
Now we can try to load the parent
package, and see where Python finds its source files:
>>> # Continue from previous stage
>>> import parent
>>> parent.__path__
_NamespacePath([
'/var/usr/lib/mainPackage/parent',
'/home/myUser/dev/packageExtension/parent'
])
What's interesting here is the value of parent.__path__
: like said before, it's an iterable of strings for all the locations in sys.path
where a namespace package named parent
has been found.
Now if we try to import the child
package, we should have the same thing:
>>> # Continue from previous stage
>>> import parent.child
>>> parent.child.__path__
_NamespacePath([
'/var/usr/lib/mainPackage/parent/child',
'/home/myUser/dev/packageExtension/parent/child'
])
And now we can import both foo
and bar
modules under the same package name, even though they are in different locations:
>>> # Continue from previous stage
>>> import parent.child.foo
>>> parent.child.foo.__file__
'/var/usr/lib/mainPackage/parent/child/foo.py'
>>> import parent.child.bar
>>> parent.child.bar.__file__
'/home/myUser/dev/packageExtension/parent/child/bar.py'
Dynamic loading of package parts
Now we've seen that to be able to load differents portions of a namespace package, first they all need to be accessible under sys.path
. What's cool is that if during the run, a new portion of a namespace package that's already imported were to be added to sys.path
, it'll be able to be loaded under the same namespace package.
Take the following structure as example. It the same as previously, but with /home/myOtherUser/dev/otherPackageExtension/
added:
/home/myUser/dev/
└── packageExtension/
└── parent/
└── child/
└── bar.py
/home/myOtherUser/dev/
└── otherPackageExtension/
└── parent/
└── child/
└── baz.py
/var/usr/lib
└── mainPackage/
└── parent/
└── child/
└── foo.py
Now if we continue back where we left the previous example:
>>> # Continue from previous stage
>>> parent.__path__
_NamespacePath([
'/var/usr/lib/mainPackage/parent',
'/home/myUser/dev/packageExtension/parent'
])
>>> import parent.child.baz
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'parent.child.baz'
Now if we add /home/myOtherUser/dev/otherPackageExtension/
to sys.path
, the namespace package parent
's __path__
should be recomputed to include the newly found portion:
>>> # Continue from previous stage
>>> sys.path.append('/home/myOtherUser/dev/otherPackageExtension/')
>>> sys.path
[
...,
'/var/usr/lib/mainPackage',
'/home/myUser/dev/packageExtension',
'/home/myOtherUser/dev/otherPackageExtension'
...,
]
>>> parent.__path__
_NamespacePath([
'/var/usr/lib/mainPackage/parent',
'/home/myUser/dev/packageExtension/parent',
'/home/myOtherUser/dev/otherPackageExtension/parent'
])
Which mean we can now import parent.child.baz
:
>>> # Continue from previous stage
>>> import parent.child.baz
>>> parent.child.baz.__file__
'/home/myUser/dev/otherPackageExtension/parent/child/baz.py'
Limitations
There are some limitations to this :
- All portions of a namespace package must be declared as namespace packages (ie. without
__init__.py
files). So the following is a namespace package with 2 portions (one in/home/myUser/dev/packageExtension
and one in/var/usr/lib/mainPackage
):
/home/myUser/dev/
└── packageExtension/
└── parent/
└── child/
└── bar.py
/var/usr/lib
└── mainPackage/
└── parent/
└── child/
└── foo.py
But the following is not (note the __init__.py
inside /home/myUser/dev/packageExtension
), even though /var/usr/lib/mainPackage
is a namespace package:
/home/myUser/dev/
└── packageExtension/
├── __init__.py
└── parent/
└── child/
└── bar.py
/var/usr/lib
└── mainPackage/
└── parent/
└── child/
└── foo.py
-
All the portions of a namespace package must be foundable in
sys.path
, though they don't have to be already in it when loading the namespace package, the__path__
is dynamically computed.This point has an implication on the performances: because the path is recomputed each time a portion is loaded, the importation of a module of package can take a bit more time with a namespace package, than with a regular one.
Possible usage
Now that we've discussed a bit the theory and seen a few examples, let's try to figure out real world applications. I've found two categories of usage of namespace applications:
-
A unique interface for different portions of packages provided by various vendors.
Imagine a low-level package that allows you to access various informations on your system, both software and hardware. You could have something that look a bit like this, where each part is provided by the manufacturer of the part at stake:
gpu_vendor/
└── system_management/
└── hardware/
└── gpu.py
cpu_vendor/
└── system_management/
└── hardware/
└── cpu.py
ram_vendor/
└── system_management/
└── hardware/
└── ram.py
os_vendor/
└── system_management/
└── software/
└── os.py
...
With this structure, and assuming all the portions are accessible within sys.path
, you'd have a single namespace package system_management
under which you could import system_management.hardware.gpu
or system_management.software.os
.
-
Adding optional extensions to a package
In this example you would have a main namespace package that provides basic functionalities which you could extend at will, by installing other portions. Have a look at the next section for a more concrete example.
More concrete example
Let's take the example of an image manipulation library. It has a base library, and additional extensions that allows the manipulation of various type of images (jpg, png,...). It has the following structure:
example/
├── baseLib/
│ └── img_lib/
│ └── lib/
│ ├── base.py
│ └── settings.py
├── jpg/
│ └── img_lib/
│ └── lib/
│ └── jpg.py
├── gif/
│ └── img_lib/
│ └── lib/
│ └── gif.py
├── png/
│ └── img_lib/
│ └── lib/
│ └── png.py
└── svg/
└── img_lib/
└── lib/
└── svg.py
Each image module simply declares its mimetype with the constant MIMETYPE
.
The base
module reads the list of extensions declared in settings.py
, and will load them. After that, it'll be able to show the mimetype of all the extensions available.
>>> import os, sys
>>> # For the sake of the example, add all the portions of the namespace package in sys.path by hand.
>>> os.getcwd()
<...>/example
>>> sys.path.extend([
...: os.path.join(os.getcwd(), 'baseLib'),
...: os.path.join(os.getcwd(), 'png'),
...: os.path.join(os.getcwd(), 'jpg'),
...: os.path.join(os.getcwd(), 'gif'),
...: os.path.join(os.getcwd(), 'svg'),
...: ])
>>> from img_lib.lib import base
>>> base.AVAILABLE_EXTENSIONS
['png', 'jpg']
>>> base.get_loaded_mimetypes()
['image/png', 'image/jpeg']
>>> base.AVAILABLE_EXTENSIONS.extend(['svg', 'gif'])
>>> base.load_modules()
>>> base.get_loaded_mimetypes()
['image/png', 'image/jpeg', 'image/svg+xml', 'image/gif']
All the source code of this example is available in this GitHub repo.
Sources
-
this setup would prevent you to import both modules
foo.py
andbar.py
at the same time if both/home/myUser/dev/package
and/var/usr/lib/package
where to be insys.path
at the same time. ↩
Posted on January 9, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.