Obfuscate Django models ids by encoding them as non-sequential non-predictable strings
Andrea Bertoloni
Posted on June 15, 2020
TL;DR
Replace urls likeexample.com/users/1/
with something likeexample.com/users/zpA1nRdJbwG/
without changing the actual ids in your db.
(Also applicable to stuff other than urls, e.g. replace api responses like{ "id": 1, "email": "user@example.com" }
with something like{ "id": "zpA1nRdJbwG", "email": "user@example.com" }
)
Sometimes, whatever the reasons, you may want to change the ids of your models, which usually are incremental integers, with some non-sequential non-predictable codes instead.
So, instead of having - for example - an url like example.com/users/1/
, you would have something like example.com/users/zpA1nRdJbwG/
.
A common solution for this is to use UUIDs as the primary keys for your models. Similarly, you could generate some other [random] unique strings when you create an object in the db.
The problem here is that we're going to lose all the benefits that come from using incremental integers as primary keys.
So, a solution might be to encode/decode the ids in the application layer, without touching the actual incremental numeric ids in the database.
To do so we're going to integrate Hashids, a library "that generates short, unique, non-sequential ids from numbers", into Django.
Install Hashids with
$ pip install hashids
# or pipenv install hashids
# or poetry add hashids
URLs
The first step is to register a custom path converter, so that we can then define a url as path('users/<hashids:user_id>/', ...)
that will automatically pass the decoded integer id in the related view.
Create a folder named ids_encoder
at the same level as any other Django app, with two empty files inside: __init__.py
converters.py
.
# inside your project root directory
$ mkdir ids_encoder
$ cd ids_encoder
$ touch __init__.py converters.py
You should end up with something like this
mysite/
|-- db.sqlite3
|-- ids_encoder
| |-- __init__.py
| `-- converters.py
|-- manage.py
|-- mysite
| |-- __init__.py
| |-- settings.py
| |-- urls.py
| `-- wsgi.py
`-- ...
In ids_encoder/converters.py
, paste this code
class HashidsConverter():
regex = '[0-9a-zA-Z]+'
def to_python(self, value: str) -> int:
hashids = Hashids()
decoded_values = hashids.decode(value)
# output of hashids.decode is always a tuple
if len(decoded_values) != 1:
raise ValueError
return decoded_values[0]
def to_url(self, value: int) -> str:
hashids = Hashids()
return hashids.encode(value)
This class has the shape required for a custom path converter:
- a regex (as a string) that matches the parameter passed in the url
- a method
to_python
which converts the matched string into the type that should be passed to the view function - a method
to_url
which converts the Python type into a string to be used in the URL
We now need to register the custom converter, so go in mysite/urls.py
from django.urls import register_converter
from ids_encoder import converters
register_converter(converters.HashidsConverter, 'hashids')
urlpatterns = []
To test that everything works, we just write a simple view that returns the exact params the it receives (remind: the view itself receives the already decoded value).
In mysite/urls.py
from django.urls import path, register_converter
from django.http import HttpResponse
from ids_encoder import converters
register_converter(converters.HashidsConverter, 'hashids')
def test_user_id(request, user_id):
return HttpResponse(user_id)
urlpatterns = [
path('users/<hashids:user_id>/', test_user_id, name='test_user_id'),
]
We are now ready to test it.
Generate a hashid like this
$ ./manage.py shell -c 'from hashids import Hashids; print(Hashids().encode(1))'
jR
Now, run the development server and send a request
$ ./manage.py runserver
and from another shell
$ curl localhost:8000/users/jR/
1
If everything is correct you should see that the curl command returns 1
, which is the value we encoded in the previous command.
Ok, the custom path converter works just fine, yet the encoded values are still easily guessable by anyone with just as little as Hashids().decode(jR)
.
That is why Hashids - no big surprise - let us set a salt (also a min_length and a custom alphabet).
So now we are going to set our custom values in settings.py
and doing a bit of refactoring, ending up with custom encode/decode utility functions to also being able to encode/decode ids in other parts of our project (for example in a DRF serializer).
Salt, min_length and encode/decode utilities
In mysite/settings.py
add this dict
HASHIDS = {
# SECURITY WARNING: keep the salt used in production secret!
'SALT': 'Nel mezzo del cammin di nostra vita',
'MIN_LENGTH': 11
}
Create a new file ids_encoder/utils.py
and paste this code in it
from django.conf import settings
def get_params():
try:
HASHIDS = settings.HASHIDS
except:
HASHIDS = {}
salt = HASHIDS.get('SALT')
min_length = HASHIDS.get('MIN_LENGTH')
res = {}
if salt: res['salt'] = salt
if min_length: res['min_length'] = min_length
return res
def get_regex(params):
min_length = params.get('min_length')
if min_length is not None:
return f'[0-9a-zA-Z]{{{ min_length },}}'
return '[0-9a-zA-Z]+'
PARAMS = get_params()
REGEX = get_regex(PARAMS)
get_params
reads our custom settings and convert that in dict that can then be passed to the Hashids initilizer.
get_regex
returns the appropriate regex according to settings (e.g. if no min length is set the regex would be '[0-9a-zA-Z]+'
, if min length is 11
the regex would be '[0-9a-zA-Z]{11,}'
)
Now in ids_encoder/__init__.py
from .utils import PARAMS
from hashids import Hashids
hashids = Hashids(**PARAMS)
def encode_id(_id: int) -> str:
return hashids.encode(_id)
def decode_id(_id: str) -> int:
decoded_values = hashids.decode(_id)
# output of hashids.decode is always a tuple
if len(decoded_values) != 1:
raise ValueError
return decoded_values[0]
These are the encode/decode functions that we could later import anywhere in the project.
And finally, in ids_encoder/converters.py
, replace the previous code with
from .utils import REGEX
from . import encode_id, decode_id
class HashidsConverter():
regex = REGEX
def to_python(self, value: str) -> int:
return decode_id(value)
def to_url(self, value: int) -> str:
return encode_id(value)
Ok, now everything's done. We can test it as we did before.
Generate a hashid like this (note, we are now using our own encode_id
function to generate the hashid)
$ ./manage.py shell -c 'from ids_encoder import encode_id; print(encode_id(1))'
zpA1nRdJbwG
Run the development server and send a request
$ ./manage.py runserver
and from another shell
$ curl localhost:8000/users/zpA1nRdJbwG/
1
Example within a DRF serializer
If you want to obfuscate the id field in the response of a Django REST Framework view, I would define a serializer like this
from rest_framework.serializers import ModelSerializer
from django.contrib.auth import get_user_model
from ids_encoder import encode_id
class UsersSerializer(ModelSerializer):
class Meta:
model = get_user_model()
fields = ['id', 'email']
def to_representation(self, instance):
"""Convert id to hashid"""
res = super().to_representation(instance)
res['id'] = encode_id(res['id'])
return res
Note that you may also want to override the to_internal_value
method of a DRF serializer (refer to the DRF docs).
Posted on June 15, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.