Obfuscate Django models ids by encoding them as non-sequential non-predictable strings

ndrbrt

Andrea Bertoloni

Posted on June 15, 2020

Obfuscate Django models ids by encoding them as non-sequential non-predictable strings

TL;DR
Replace urls like example.com/users/1/ with something like example.com/users/zpA1nRdJbwG/ without changing the actual ids in your db.
(Also applicable to stuff other than urls, e.g. replace api responses like { "id": 1, "email": "user@example.com" } with something like { "id": "zpA1nRdJbwG", "email": "user@example.com" })

Sometimes, whatever the reasons, you may want to change the ids of your models, which usually are incremental integers, with some non-sequential non-predictable codes instead.
So, instead of having - for example - an url like example.com/users/1/, you would have something like example.com/users/zpA1nRdJbwG/.

A common solution for this is to use UUIDs as the primary keys for your models. Similarly, you could generate some other [random] unique strings when you create an object in the db.
The problem here is that we're going to lose all the benefits that come from using incremental integers as primary keys.

So, a solution might be to encode/decode the ids in the application layer, without touching the actual incremental numeric ids in the database.
To do so we're going to integrate Hashids, a library "that generates short, unique, non-sequential ids from numbers", into Django.

Install Hashids with

$ pip install hashids
# or pipenv install hashids
# or poetry add hashids

URLs

The first step is to register a custom path converter, so that we can then define a url as path('users/<hashids:user_id>/', ...) that will automatically pass the decoded integer id in the related view.

Create a folder named ids_encoder at the same level as any other Django app, with two empty files inside: __init__.py converters.py.

# inside your project root directory
$ mkdir ids_encoder
$ cd ids_encoder
$ touch __init__.py converters.py

You should end up with something like this

mysite/
|-- db.sqlite3
|-- ids_encoder
|   |-- __init__.py
|   `-- converters.py
|-- manage.py
|-- mysite
|   |-- __init__.py
|   |-- settings.py
|   |-- urls.py
|   `-- wsgi.py
`-- ...

In ids_encoder/converters.py, paste this code

class HashidsConverter():
    regex = '[0-9a-zA-Z]+'

    def to_python(self, value: str) -> int:
        hashids = Hashids()
        decoded_values = hashids.decode(value)
        # output of hashids.decode is always a tuple
        if len(decoded_values) != 1:
            raise ValueError
        return decoded_values[0]

    def to_url(self, value: int) -> str:
        hashids = Hashids()
        return hashids.encode(value)

This class has the shape required for a custom path converter:

  • a regex (as a string) that matches the parameter passed in the url
  • a method to_python which converts the matched string into the type that should be passed to the view function
  • a method to_url which converts the Python type into a string to be used in the URL

We now need to register the custom converter, so go in mysite/urls.py

from django.urls import register_converter

from ids_encoder import converters

register_converter(converters.HashidsConverter, 'hashids')

urlpatterns = []

To test that everything works, we just write a simple view that returns the exact params the it receives (remind: the view itself receives the already decoded value).

In mysite/urls.py

from django.urls import path, register_converter
from django.http import HttpResponse

from ids_encoder import converters

register_converter(converters.HashidsConverter, 'hashids')

def test_user_id(request, user_id):
    return HttpResponse(user_id)

urlpatterns = [
    path('users/<hashids:user_id>/', test_user_id, name='test_user_id'),
]

We are now ready to test it.
Generate a hashid like this

$ ./manage.py shell -c 'from hashids import Hashids; print(Hashids().encode(1))'
jR

Now, run the development server and send a request

$ ./manage.py runserver

and from another shell

$ curl localhost:8000/users/jR/
1

If everything is correct you should see that the curl command returns 1, which is the value we encoded in the previous command.

Ok, the custom path converter works just fine, yet the encoded values are still easily guessable by anyone with just as little as Hashids().decode(jR).
That is why Hashids - no big surprise - let us set a salt (also a min_length and a custom alphabet).

So now we are going to set our custom values in settings.py and doing a bit of refactoring, ending up with custom encode/decode utility functions to also being able to encode/decode ids in other parts of our project (for example in a DRF serializer).

Salt, min_length and encode/decode utilities

In mysite/settings.py add this dict

HASHIDS = {
    # SECURITY WARNING: keep the salt used in production secret!
    'SALT': 'Nel mezzo del cammin di nostra vita',
    'MIN_LENGTH': 11
}

Create a new file ids_encoder/utils.py and paste this code in it

from django.conf import settings

def get_params():
    try:
        HASHIDS = settings.HASHIDS
    except:
        HASHIDS = {}

    salt = HASHIDS.get('SALT')
    min_length = HASHIDS.get('MIN_LENGTH')
    res = {}
    if salt: res['salt'] = salt
    if min_length: res['min_length'] = min_length

    return res

def get_regex(params):
    min_length = params.get('min_length')
    if min_length is not None:
        return f'[0-9a-zA-Z]{{{ min_length },}}'
    return '[0-9a-zA-Z]+'

PARAMS = get_params()
REGEX = get_regex(PARAMS)

get_params reads our custom settings and convert that in dict that can then be passed to the Hashids initilizer.
get_regex returns the appropriate regex according to settings (e.g. if no min length is set the regex would be '[0-9a-zA-Z]+', if min length is 11 the regex would be '[0-9a-zA-Z]{11,}')

Now in ids_encoder/__init__.py

from .utils import PARAMS
from hashids import Hashids

hashids = Hashids(**PARAMS)

def encode_id(_id: int) -> str:
    return hashids.encode(_id)

def decode_id(_id: str) -> int:
    decoded_values = hashids.decode(_id)
    # output of hashids.decode is always a tuple
    if len(decoded_values) != 1:
        raise ValueError
    return decoded_values[0]

These are the encode/decode functions that we could later import anywhere in the project.

And finally, in ids_encoder/converters.py, replace the previous code with

from .utils import REGEX
from . import encode_id, decode_id


class HashidsConverter():
    regex = REGEX

    def to_python(self, value: str) -> int:
        return decode_id(value)

    def to_url(self, value: int) -> str:
        return encode_id(value)

Ok, now everything's done. We can test it as we did before.
Generate a hashid like this (note, we are now using our own encode_id function to generate the hashid)

$ ./manage.py shell -c 'from ids_encoder import encode_id; print(encode_id(1))'
zpA1nRdJbwG

Run the development server and send a request

$ ./manage.py runserver

and from another shell

$ curl localhost:8000/users/zpA1nRdJbwG/
1

Example within a DRF serializer

If you want to obfuscate the id field in the response of a Django REST Framework view, I would define a serializer like this

from rest_framework.serializers import ModelSerializer
from django.contrib.auth import get_user_model

from ids_encoder import encode_id


class UsersSerializer(ModelSerializer):
    class Meta:
        model = get_user_model()
        fields = ['id', 'email']

    def to_representation(self, instance):
        """Convert id to hashid"""
        res = super().to_representation(instance)
        res['id'] = encode_id(res['id'])
        return res

Note that you may also want to override the to_internal_value method of a DRF serializer (refer to the DRF docs).

💖 💪 🙅 🚩
ndrbrt
Andrea Bertoloni

Posted on June 15, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related