Options for public facing IDs in Django

spikelanterncom

spikelantern

Posted on July 27, 2020

Options for public facing IDs in Django

Django uses auto-incrementing integer columns as primary keys for models by default. If you've done some basic tutorials you may have used these integer IDs as identifiers in URLs or HTTP responses.

For example, things like /posts/23/ where 23 is the primary key of a Post.

At some point you may have read or received advice that it's probably not that good to use these auto-incrementing IDs publicly. One reason is that it leaks information about the size of database table, which may be undesirable for a few reasons (like leaking information to competitors, or even a potential attacker). Additionally, it's easy for a potential attacker to guess, and if you don't secure your endpoints properly you may be more vulnerable to some IDOR-type situations.

Have you wondered what you can use instead? Luckily, you have a few good alternatives.

Maybe just use another unique field as an identifier

Using another unique field is the most straightforward approach, assuming it's URL-safe. For example, for a profile page in a social networking platform, you could simply use the username.

Both Twitter and Instagram does this, as well as most social networking sites I'm aware of, e.g.:

https://twitter.com/spikelanterncom

Slugs

In many cases it may be appropriate to use something called a "slug". Django has a built-in field for this called SlugField, and in their documentation they have a good definition for what a slug is:

Slug is a newspaper term. A slug is a short label for something, containing only letters, numbers, underscores or hyphens. They’re generally used in URLs.

So, if you have something that can naturally fits well to slugs, such as articles, blog posts, and anything with a unique title, slugs might be a decent option.

UUIDs

Another option is to use Django's built-in UUIDField.

What does a UUID look like? You can generate one yourself by firing up the Python REPL and running this code:

>>> import uuid
>>> my_uuid = uuid.uuid4()
>>> print(my_uuid)
f22db40f-3e1e-4ac2-9c10-62ecea301f5e

These are URL-safe, that means you can safely put them in URLs.

So instead of /products/23/ you'll see /products/f22db40f-3e1e-4ac2-9c10-62ecea301f5e/

It's kind of ugly, but not too bad. To make it shorter, you can run something like a base62 or base58 encoding on the UUID to produce a more compact string. The django-extensions library does this in ShortUUIDField (more on that later).

Note that if you do this you should keep using UUIDField, and not use CharField, see this blog post for a good discussion.

Base64 encode a few bytes from /dev/urandom

You could also grab a few bytes from /dev/urandom and do a url-safe base64 encoding.

How many bytes are enough? I recently found this fantastic article from Neil Madden, titled "Moving away from UUIDs" which compared this method to using UUIDs, and using some math, determined that using 160 bits (20 bytes) is a good option for general random strings like access tokens.

For an identifier, you could probably use something much shorter, but I found that 160 bits works okay for URLs anyway, and is still shorter than the standard representation of UUIDs. UUIDs can also be shorter if you also encode them, as they are also just strings representing 128 bits. We'll see options for that later in this article.

Here's some code to generate such a random string:

import os
from django.utils.http import urlsafe_base64_encode as b64encode

def generate_random_field(bytes=20):
    return b64encode(os.urandom(bytes))

Try running that on your Python shell:

>>> generate_random_field()
'OVXxfipWi2VdPC8GGCKmlR6oDhc'
>>> generate_random_field()
'8ApcI0mvC_a_MTJN8Hj4um_DAsQ'
>>> generate_random_field()
'W9myCGsv93zo0vk5x9rLyd9cwI0'

The result is a 27 character string, which you could just put in a CharField, but you probably want to write a custom field like this:

import os
from django.db.models import CharField
from django.utils.http import urlsafe_base64_encode as b64encode

def generate_random_field():
    return b64encode(os.urandom(20))

class RandomIDField(CharField):
    description = "A random field meant for URL-safe identifiers"

    def __init__ (self, **kwargs):
        kwargs['max_length'] = 27

        super(). __init__ (**kwargs)

    def deconstruct(self):
        name, path, args, kwargs = super().deconstruct()
        del kwargs["max_length"]

        return name, path, args, kwargs

    def pre_save(self, model_instance, add):
        """
        This is used to ensure that we auto-set values if required.
        See CharField.pre_save
        """
        value = super().pre_save(model_instance, add)
        if not value:
            value = generate_random_field()
            setattr(model_instance, self.attname, value)
        return value

Note that the above is for the specific case of 20 bytes, and I didn't bother including any configuration options. If you want to configure this field by passing in the number of bytes you want, you might want to modify the code.

I would recommend just copying that code to your project directly and modifying as needed, but if for some reason you want it in a package, I have published it to PyPI. You can see the code here: https://github.com/spikelantern/django-randomcharfield

The resulting string can have hyphens and underscores, which you might be okay with, or not. I've personally found that this is a very simple method that is acceptable for most of my needs.

Use django-extensions fields

The popular package django-extensions has a few options for such fields.

If you look at the documentation for field extensions you will find several options you can use.

Particularly they have options for ShortUUIDField and RandomCharField (and AutoSlugField if you want to automate the generation of slug fields).

Their implementation of RandomCharField chooses from a limited set of characters randomly, and thus can be configured to not use any hyphens or underscores.

Tip: You don't need to replace your primary key

By the way, even if you choose any of these options, you don't need to replace your default primary key. An auto-incrementing integer as your primary key has some benefits, like being easy to sort (e.g. if you don't have a timestamp for when the entry is added). Also it's compact, and easy to work with in, say, the Django admin.

What you can do is add a second field to your model which you use for anything external, and keep using AutoField for your internal operations.

Tip: Look at your responses too

Make sure you remove the auto-incrementing IDs from HTTP responses as well. It's pointless to just change your URLs to hide the IDs, only to leak them somewhere else!

Takeaways

You have a number of options if you are concerned about revealing auto-incrementing IDs publicly (e.g. in URLs).

If you already have an existing unique field that can be public (e.g. a username), then that is normally a good option to use. Slugs can be good options for when you have a resource that fits well such as articles or blog posts. Both of these options will give you nice, human-readable URLs.

Some other alternatives discussed were UUIDs and random character fields, by either writing your own custom field, or using a library like django-extensions.

Hope you found this post helpful! If you enjoyed this, please subscribe to my mailing list and receive some very occasional emails about Django.

💖 💪 🙅 🚩
spikelanterncom
spikelantern

Posted on July 27, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related