The trouble with __all__

caelean

Caelean Barnes

Posted on August 1, 2024

The trouble with __all__

According to the official Python PEP #8:

To better support introspection, modules should explicitly declare the names in their public API using the __all__ attribute.

This sounds great! Let’s give it a go. To define a public API for a module, simply specify it in __all__ like so:

# core.py

class PublicAPI:
    def get(self) -> str:
        return "public"

class _PrivateAPI:
    def get(self) -> str:
        return "shhh"

__all__ = ["PublicAPI"]
Enter fullscreen mode Exit fullscreen mode

The single underscore for _PrivateAPI is also recommended in the same PEP.

While the above does define the public api for core, it unfortunately doesn’t enforce it.

>>> from core import _PrivateAPI
>>> _PrivateAPI().get()
'shhh'
Enter fullscreen mode Exit fullscreen mode

Python will even bring _PrivateAPI along for the ride if you just import core:

>>> import core
>>> core._PrivateAPI().get()
'shhh'
Enter fullscreen mode Exit fullscreen mode

So what does __all__ actually do? The only place __all__ has any impact is with respect to import *:

>>> from core import *
>>> PublicAPI().get()
'public'
>>> _PrivateAPI().get()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name '_PrivateAPI' is not defined
Enter fullscreen mode Exit fullscreen mode

Wildcard imports aren’t exactly the best interface to go through when utilizing Python modules.

__all__ breaks from the convention of other languages - Java/C++ have public, TypeScript has export, rust has pub - all to help define and enforce the public interface.

In my time as a Python developer, I’ve rarely seen __all__ used. In fact, most developers don’t bother signifying whether a function or class should be private or public, simply because Python doesn’t offer a good way to do so.

Above all, this is dangerous to teams developing on the same repository.

Over time, this results in modules that were intended to be separate getting tightly coupled together, and domain boundaries breaking down.

Taken to the extreme, this can cost companies $Ms. I’ve seen a unicorn startup dump their existing codebase and start over because the modules within it became so tightly coupled together, it was impossible to effectively develop within it or break them apart.

So what can we do? Enter this fun little code snippet (tweaked a bit for our use case):

class ModuleWrapper(ModuleType):
    def __init__(self, module: ModuleType):
        self._module = module
        super().__init__(module.__name__, module.__doc__)

    def __getattr__(self, name: str):
        module = object.__getattribute__(self, "_module")
        if "__all__" in module.__dict__:
            if name in module.__all__:
                return getattr(module, name)
            raise AttributeError(f"Module '{module.__name__}' does not export '{name}'")
        # Could also error here if we want *everything* going through __all__
        return getattr(module, name) 
Enter fullscreen mode Exit fullscreen mode

Okay, now we have a way to define a public interface which actually works!

>>> import sys
>>> import core
>>> sys.modules['core'] = core.ModuleWrapper(core)
>>> core._PrivateAPI()
AttributeError: Module 'core' does not export '_PrivateAPI'
Enter fullscreen mode Exit fullscreen mode

This isn’t the prettiest solution, nor the most scalable. The existing “exploit” still exists. In order to fix this, we can use importlib to create a custom import hook with ModuleWrapper:

# hook.py
class WrapperFinder:
    def __init__(self):
        self.original_import = __import__

    def strict_import(self, fullname: str, *args, **kwargs):
        if fullname in sys.modules:
            return sys.modules[fullname]
        module = self.original_import(fullname, *args, **kwargs)
        wrapped = ModuleWrapper(module)
        sys.modules[fullname] = wrapped
        return wrapped

def enable_strict_imports():
    finder = WrapperFinder()
    __builtins__["__import__"] = finder.strict_import
    sys.meta_path.insert(0, finder)
Enter fullscreen mode Exit fullscreen mode

Finally, the behavior we want! enable_strict_imports will impact any imports invoked after it’s called.

>>> from hook import enable_strict_imports
>>> enable_strict_imports()
>>> import core
>>> core.PublicAPI().get()
'public'
>>> core.PrivateAPI.get()
AttributeError: Module 'core' does not export '_PrivateAPI'
>>> from core import PrivateAPI
ImportError: cannot import name '_PrivateAPI' from 'core'
Enter fullscreen mode Exit fullscreen mode

If you find this useful, all the code is here!

A Better Solution

While we now have a way to enforce imports going through __all__, there’s a few downsides:

  • There’s a requirement to run enable_strict_imports before any other imports
  • There’s runtime impact
  • There’s no concept of scope

Enter Tach - a tool we built to help resolve these issues. With Tach, you can declare each module, and define a strict interface through __all__. It has no runtime impact as it’s enforced through static analysis. You also get more fine-grained control of which modules can see each other.

Let’s take a look at doing that with our little sample codebase! In this case, we’ll move from the Python shell to main.py to run our imports.

# main.py
from core import _PrivateAPI  # bad!! 
Enter fullscreen mode Exit fullscreen mode

First off, pip install tach.

tach mod lets us mark core and main as modules.

Untitled

tach sync creates the following configuration for our project:

# tach.yml
modules:
  - path: core
    depends_on: []
  - path: main
    depends_on:
      - path: core
Enter fullscreen mode Exit fullscreen mode

With one line (strict: true), we can enforce strict imports on core :

modules:
  - path: core
    **strict: true**  # enforce imports of core go through __all__
    depends_on: []
  - path: main
    depends_on:
      - path: core
Enter fullscreen mode Exit fullscreen mode

And with tach check, we can ensure that core is used correctly by main!

Let’s see what we get:

> tach check
 main.py[L1]: Module 'core' is in strict mode. Only imports from the public interface of this module are allowed. The import 'core._PrivateAPI' (in module 'main') is not included in __all__.
Enter fullscreen mode Exit fullscreen mode

Tach caught it! If we update main.py to the correct usage:

# main.py
from core import PublicAPI
Enter fullscreen mode Exit fullscreen mode

tach check gives us:

> tach check
 All module dependencies validated!
Enter fullscreen mode Exit fullscreen mode

The ergonomics are different - instead of a runtime failure, we have a CLI command that we can add to CI, as a file watcher, or as a pre-commit hook. That being said, we’ve fixed all the other downsides of our other approach:

  • There’s no requirement to import and execute code for it to work
  • There’s no runtime impact
  • We can gain more control of the dependency graph! (In the above example, core can’t import main)

If you’ve dealt with this problem before, I’d love to hear your solution! All the above code is open source and can be found here and here.

At Gauge, we’re working to solve the monolith/microservices dilemma. If that sounds like something you’re dealing with, we’d love to chat!

💖 💪 🙅 🚩
caelean
Caelean Barnes

Posted on August 1, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related