Discovering Python Namespace Packages

bastantoine

Bastien ANTOINE

Posted on January 9, 2022

Discovering Python Namespace Packages

Note: this article was originally published on my personal blog. Check it out here.

In a recent post about the __path__ attribute, I explained that it would be possible to manipulate a library's path to dynamically extend its functionalities. At the end of my article I detailed that this possibility has already been thought of was integrated in the import system since Python 3.3 with PEP 420, under the name of namespace package.

Today I'd like to go a bit more in details about this.

What is a package in Python?

The Python glossary defines a package as:

A Python module which can contain submodules or recursively, subpackages. Technically, a package is a Python module with a __path__ attribute.

So a package in Python is basically a set of .py files (modules) organized in subdirectories (subpackages).

The __path__ attribute is specific to the packages. It is used by the import system when trying to import modules of subpackages. Basically its function is the same as sys.path, but limited to the scope of a package. I'll let you check out my article on the __path__ attribute about it for more information.

The specificity of the namespace packages

A namespace package is a special kind of Python package: among other differences with the regular packages, the most important and notable difference is that its source files can be splited across multiple locations.

This means that the following structure can be considered a namespace package, named parent (I'll explain later on how this can be used):

/home/myUser/dev/
└── package/
    └── parent/
        └── child/
            └── foo.py
/var/usr/lib
└── package/
    └── parent/
        └── child/
            └── bar.py
Enter fullscreen mode Exit fullscreen mode

Now with the following example, I would have two distinct packages, both named parent, one in /home/myUser/dev/ and another in /var/usr/lib1.

/home/myUser/dev/
└── package/
    └── parent/
        ├── __init__.py
        └── child/
            ├── __init__.py
            └── foo.py
/var/usr/lib
└── package/
    └── parent/
        ├── __init__.py
        └── child/
            ├── __init__.py
            └── bar.py
Enter fullscreen mode Exit fullscreen mode

Differences with the regular packages

Like I already mentioned, the most notable difference between regular and namespace packages is that the latter can be splitted across multiple locations.

This ability to split the sources also impacts the package's __path__ attribute: instead of being a list with a single element being the path of the package's __init__.py, it is an iterable of paths, containing all the locations of the package's sources.

Keen eyes might have noticed also that I ommitted the __init__.py files in the structure of the namespace package example above. That's not a mistake, actually namespace package are recognized and detected by the Python import system by their lack of __init__.py files.

Step by step basic example

Take the following structure as example:

/home/myUser/dev/
└── packageExtension/
    └── parent/
        └── child/
            └── bar.py
/var/usr/lib
└── mainPackage/
    └── parent/
        └── child/
            └── foo.py
Enter fullscreen mode Exit fullscreen mode

To make it work as a namespace package, we first need to make sure that both sources can be found in sys.path. For the sake of the examples we'll add them by hand in it:

>>> import sys
>>> sys.path.extend([
  '/var/usr/lib/mainPackage',
  '/home/myUser/dev/packageExtension',
])
Enter fullscreen mode Exit fullscreen mode

Now we can try to load the parent package, and see where Python finds its source files:

>>> # Continue from previous stage
>>> import parent
>>> parent.__path__
_NamespacePath([
  '/var/usr/lib/mainPackage/parent',
  '/home/myUser/dev/packageExtension/parent'
])
Enter fullscreen mode Exit fullscreen mode

What's interesting here is the value of parent.__path__: like said before, it's an iterable of strings for all the locations in sys.path where a namespace package named parent has been found.

Now if we try to import the child package, we should have the same thing:

>>> # Continue from previous stage
>>> import parent.child
>>> parent.child.__path__
_NamespacePath([
  '/var/usr/lib/mainPackage/parent/child',
  '/home/myUser/dev/packageExtension/parent/child'
])
Enter fullscreen mode Exit fullscreen mode

And now we can import both foo and bar modules under the same package name, even though they are in different locations:

>>> # Continue from previous stage
>>> import parent.child.foo
>>> parent.child.foo.__file__
'/var/usr/lib/mainPackage/parent/child/foo.py'
>>> import parent.child.bar
>>> parent.child.bar.__file__
'/home/myUser/dev/packageExtension/parent/child/bar.py'
Enter fullscreen mode Exit fullscreen mode

Dynamic loading of package parts

Now we've seen that to be able to load differents portions of a namespace package, first they all need to be accessible under sys.path. What's cool is that if during the run, a new portion of a namespace package that's already imported were to be added to sys.path, it'll be able to be loaded under the same namespace package.

Take the following structure as example. It the same as previously, but with /home/myOtherUser/dev/otherPackageExtension/ added:

/home/myUser/dev/
└── packageExtension/
    └── parent/
        └── child/
            └── bar.py
/home/myOtherUser/dev/
└── otherPackageExtension/
    └── parent/
        └── child/
            └── baz.py
/var/usr/lib
└── mainPackage/
    └── parent/
        └── child/
            └── foo.py
Enter fullscreen mode Exit fullscreen mode

Now if we continue back where we left the previous example:

>>> # Continue from previous stage
>>> parent.__path__
_NamespacePath([
  '/var/usr/lib/mainPackage/parent',
  '/home/myUser/dev/packageExtension/parent'
])
>>> import parent.child.baz
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'parent.child.baz'
Enter fullscreen mode Exit fullscreen mode

Now if we add /home/myOtherUser/dev/otherPackageExtension/ to sys.path, the namespace package parent's __path__ should be recomputed to include the newly found portion:

>>> # Continue from previous stage
>>> sys.path.append('/home/myOtherUser/dev/otherPackageExtension/')
>>> sys.path
[
  ...,
  '/var/usr/lib/mainPackage',
  '/home/myUser/dev/packageExtension',
  '/home/myOtherUser/dev/otherPackageExtension'
  ...,
]
>>> parent.__path__
_NamespacePath([
  '/var/usr/lib/mainPackage/parent',
  '/home/myUser/dev/packageExtension/parent',
  '/home/myOtherUser/dev/otherPackageExtension/parent'
])
Enter fullscreen mode Exit fullscreen mode

Which mean we can now import parent.child.baz:

>>> # Continue from previous stage
>>> import parent.child.baz
>>> parent.child.baz.__file__
'/home/myUser/dev/otherPackageExtension/parent/child/baz.py'
Enter fullscreen mode Exit fullscreen mode

Limitations

There are some limitations to this :

  1. All portions of a namespace package must be declared as namespace packages (ie. without __init__.py files). So the following is a namespace package with 2 portions (one in /home/myUser/dev/packageExtension and one in /var/usr/lib/mainPackage):
/home/myUser/dev/
└── packageExtension/
    └── parent/
        └── child/
            └── bar.py
/var/usr/lib
└── mainPackage/
    └── parent/
        └── child/
            └── foo.py
Enter fullscreen mode Exit fullscreen mode

But the following is not (note the __init__.py inside /home/myUser/dev/packageExtension), even though /var/usr/lib/mainPackage is a namespace package:

/home/myUser/dev/
└── packageExtension/
    ├── __init__.py
    └── parent/
        └── child/
            └── bar.py
/var/usr/lib
└── mainPackage/
    └── parent/
        └── child/
            └── foo.py
Enter fullscreen mode Exit fullscreen mode
  1. All the portions of a namespace package must be foundable in sys.path, though they don't have to be already in it when loading the namespace package, the __path__ is dynamically computed.

    This point has an implication on the performances: because the path is recomputed each time a portion is loaded, the importation of a module of package can take a bit more time with a namespace package, than with a regular one.

Possible usage

Now that we've discussed a bit the theory and seen a few examples, let's try to figure out real world applications. I've found two categories of usage of namespace applications:

  1. A unique interface for different portions of packages provided by various vendors.

    Imagine a low-level package that allows you to access various informations on your system, both software and hardware. You could have something that look a bit like this, where each part is provided by the manufacturer of the part at stake:

gpu_vendor/
└── system_management/
    └── hardware/
        └── gpu.py
cpu_vendor/
└── system_management/
    └── hardware/
        └── cpu.py
ram_vendor/
└── system_management/
    └── hardware/
        └── ram.py
os_vendor/
└── system_management/
    └── software/
        └── os.py
...
Enter fullscreen mode Exit fullscreen mode

With this structure, and assuming all the portions are accessible within sys.path, you'd have a single namespace package system_management under which you could import system_management.hardware.gpu or system_management.software.os.

  1. Adding optional extensions to a package

    In this example you would have a main namespace package that provides basic functionalities which you could extend at will, by installing other portions. Have a look at the next section for a more concrete example.

More concrete example

Let's take the example of an image manipulation library. It has a base library, and additional extensions that allows the manipulation of various type of images (jpg, png,...). It has the following structure:

example/
├── baseLib/
│   └── img_lib/
│       └── lib/
│           ├── base.py
│           └── settings.py
├── jpg/
│   └── img_lib/
│       └── lib/
│           └── jpg.py
├── gif/
│   └── img_lib/
│       └── lib/
│           └── gif.py
├── png/
│   └── img_lib/
│       └── lib/
│           └── png.py
└── svg/
    └── img_lib/
        └── lib/
            └── svg.py
Enter fullscreen mode Exit fullscreen mode

Each image module simply declares its mimetype with the constant MIMETYPE.

The base module reads the list of extensions declared in settings.py, and will load them. After that, it'll be able to show the mimetype of all the extensions available.

>>> import os, sys
>>> # For the sake of the example, add all the portions of the namespace package in sys.path by hand.
>>> os.getcwd()
<...>/example
>>> sys.path.extend([
   ...:     os.path.join(os.getcwd(), 'baseLib'),
   ...:     os.path.join(os.getcwd(), 'png'),
   ...:     os.path.join(os.getcwd(), 'jpg'),
   ...:     os.path.join(os.getcwd(), 'gif'),
   ...:     os.path.join(os.getcwd(), 'svg'),
   ...: ])
>>> from img_lib.lib import base
>>> base.AVAILABLE_EXTENSIONS
['png', 'jpg']
>>> base.get_loaded_mimetypes()
['image/png', 'image/jpeg']
>>> base.AVAILABLE_EXTENSIONS.extend(['svg', 'gif'])
>>> base.load_modules()
>>> base.get_loaded_mimetypes()
['image/png', 'image/jpeg', 'image/svg+xml', 'image/gif']
Enter fullscreen mode Exit fullscreen mode

All the source code of this example is available in this GitHub repo.

Sources


  1. this setup would prevent you to import both modules foo.py and bar.py at the same time if both /home/myUser/dev/package and /var/usr/lib/package where to be in sys.path at the same time. 

💖 💪 🙅 🚩
bastantoine
Bastien ANTOINE

Posted on January 9, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related