Powerful filtering for Django Rest Framework

maxipavlovic

Max Kolyubyakin

Posted on June 1, 2020

Powerful filtering for Django Rest Framework

Overview

Sooner or later each web project experiences necessity for complex filtering. A social app doesn't make sense without search by connections. A good supply automation platform needs to filter orders by products or marketplaces. Any online store in 2020 allows you to filter goods by N different categories… I could continue this list forever, but I hope that idea is pretty clear. The open question, though, is how complex this filtering needs to be and what it takes to implement it.

Filtering 🌪️ feels so natural nowadays, so it's considered that every mature framework like Django Rest Framework has all the tools you need. DRF really provides a wide range of libraries, which cover most of the generic technical scenarios. When you release an MVP and domain model of the project is pretty simple, some library like Django-Filter might seem a silver bullet. But as the project gets bigger, APIs expand and domain model gets more and more complex. Pretty soon it can be found out that existing tools do not cover all of your business scenarios and need to be replaced with something more powerful. That's where Django-RQL comes into play!

To understand the true reasoning for creation of yet-another-library, I want to mention that in CloudBlue Connect we approach filtering as an important business feature. As any other product feature, it all starts with requirements. And here they are:

  • filters must be human-readable (double underscores are not okay!);
  • there are plenty of business cases when OR or NOT logical operators are needed (even with nesting) or IN/OUT list operations, so they must be natively supported by framework;
  • filter configuration must be declarative (less code is better!);
  • but there are many cases when filter logic does not map to DB (ORM) structure and these cases must be equally supported;
  • Connect is on it's way to become a member of OpenAPI initiative, which means that framework must support auto-documentation in OpenAPI 3.0 format;
  • filtering must be really thoroughly tested;
  • there must be a way to affect objects API representation (we need to support inclusion/exclusion of fields through query parameters).

Django-RQL was created with experience of usage of other awesome filtering libraries to cover all of these requirements. In this article I will quickly describe how exactly framework achieves to do that and how anyone could use it in their projects to make awesome REST APIs.

We have been using this library in Production in several services for over a year already. It's well tested (has 100% coverage), fast, easily configured and has lot's of great features.To learn more, please, refer to our community documentation or to the video below.


Human Readable Filters

Double underscore notation is the first class citizen in Django world: it's used everywhere for field and relation lookups. Most of the libraries for filtering in Django or DRF use the same notation for filtering in query parameters. To illustrate the case, let's imagine the following domain model: a Company owns several Products with different Categories. Customers of our service ask us to expose companies JSON REST API with possibility to filter companies by name and by categories of their products. The query string could then look like:

/companies?products__category__id__in=CAT1-ID,CAT2=ID&name__startswith=Clou

There are several problems in this approach:

  • In Javascript world, where JSON comes from, dot notation is used to reference nested objects. Plus, in general, dot is a much wider used separator type.
  • Operators, like in or startswith become part of the properties and it becomes hard to distinguish filtered property from operation type.
  • It's not quite clear if CAT2=ID is a separate operation or is a value of a previous comparison (brackets can't be used).

In RQL notation the previous query could look like:

/companies?in(products.category.id,(CAT1-ID,"CAT2=ID"))&like(name,Clou*)

This query gets rid of all the drawbacks of previous approach: dots are used instead of underscores, operators and values are visually distinguishable. But frankly speaking, in and like RQL operators may seem unfamiliar to you. Thus I guess, it's the right time to describe what "RQL" abbreviation means.

RQL (Resource query language) is designed for modern application development. It is built for the web, ready for NoSQL, and highly extensible with simple syntax. This is a query language for fast and convenient database interaction. RQL was designed for use in URLs to request object-style data structures.

I don't want to dive into language description deeply, as use-cases from the following partitions will show practically everything you need to know about this simple language. But if you are willing to lear more about RQL, here is a good link for you:

https://www.sitepen.com/blog/resource-query-language-a-query-language-for-the-web-nosql/

Supported Operators

In previous paragraph we already saw two basic RQL operators: in and like. Django-RQL supports a much wider number of operators:

  • Comparison (eq, ne, gt, ge, lt, le, like, ilike, search)
/companies?name=Company&ge(created.at,2019-02-12)&products.id=ne=PRODUCT-ID

Let's separately look at parts of this query. name=Company is the syntactic sugar over eq(name,Company) operation. As exact equality comparison is the most widely used operation a special syntax was created for convenience of usage. ge(created.at,2019-02-12) can be translated as "Company is created Greater or Equal (later) than 12th of February in 2019". The last expression products.id=ne=PRODUCT-ID is an alias to the following operation ne(products.id,PRODUCT-ID), which can be translated as "Company, which has no PRODUCT-ID products". The syntax from the last expression can be used if you don't expect to have much complex operations or simply if you like it more. I would only suggest to not mix different styles within the same query expression as it becomes harder to read it.

Let's also consider the scenario, where we want to search by company name, but we only remember, that it has "loud" inside. In that case we have the following options:

/companies?like(name,*loud*)
/companies?ilike(name,ClOuD*)
/companies?search=loUd

If we had a company, named CloudBlue Connect, in our service, each of the above queries would return it to us in the response. like allows us to search a certain field by a provided value pattern (* equals to any symbol). ilike is the case-insensitive version of like operator. search is the alias operator to OR ilike filtering by all filters that support such lookups.

It's important to understand, that each supported filter has a certain defined type. Supported lookups are calculated automatically by Django-RQL, based on that type. For example, Company Name can be declared as non-nullable CharField in Django Model. By default the following operators will be supported for name filter: eq, ne, like, ilike. But you can always override everything, the only real limits are the possibilities of the chosen DB engine.

  • List (in, out)
/companies?in(id,(COMPANY-ID1,COMPANY-ID2))

In many UI cases it's needed to query data from one collection and then extract related data from another one in a bulk operation. That's where in operator comes in handy. out is just an equivalent of not in.

  • Logical (and, or, not)
/companies?not(name=default)&(products.id=PRD-ID1|like(name=def*))

That's where Django-RQL starts to show it's true power! OR is the first class citizen in RQL world, which is not a common case for the majority of DRF filtering frameworks. Plus, RQL queries support deep logical nesting, which can be costly for your DB engine, but is a great feature for your API consumers.

Anyway, let's analyze the above query. In a natural language it would sound like "Give me companies, which don't have default name AND which own a PRD-ID1 product OR have a name starting with def". Pretty powerful, isn't it? Moreover, in RQL there are several notations for AND and OR operators and you can choose any which you like more. AND operations can have the following structure: expr&expr&... or expr,expr,... or and(expr,expr,...); while OR: (expr|expr|...) or (expr;expr;...) or or(expr,expr,...). NOT has the only notation and the usage is clear from above query.

  • Constants (null(), empty())
/companies?name=ne=empty()&products.id=null()

Above constants help, when it's needed to filter by an empty string or check if a certain field/relation exists. Both of this constants can be used in List operations, which can ease the use of filters in UI, for example.

  • Ordering
/companies?ordering(-name,+products.name)

Ordering is an essential functionality of a good API collection. Django Rest Framework suggests to explicitly specify fields, trough which ordering can be done and that's a great idea as ordering can be a super costly operation for you DB. The problem is that DRF partially breaks DRY principle, as we need to declare some filters twice (or even thrice if we also specify search fields). And if ORM field names do not map to API names, the situation gets even worse, as you need to duplicate mappings. Django-RQL solves all of these problems with declarative configurations, which allow to explicitly specify for each field if it can be ordered or searched by.

  • Select
/companies?select(+owners,-products)

If you have read up to this point, then you already know that Django-RQL is a pretty powerful tool, which could be helpful for your project. But the real power 💪🏽 of this framework is in Select Functionality that allows to include/exclude a set of fields with optimization of SQL querying at the same time! Please, read the dedicated paragraph below to understand how it works and why it could save you lot's of money and DB computational time in a long run.

Power of Select

Select operator allows to dynamically affect object API representation through filters in query string. There are several DRF libraries, that provide similar basic functionality, but Django-RQL goes beyond simple result filtering. Framework allows you to declare configurations in a way, that even SQL querying is optimized automatically! Some collections may rely on millions of records in DB and in such cases, select becomes framework's killer feature. Let's look at it in action (if you want to try it yourself, I have created a django_rql_select_example repository specially for you).
Let's say we want to create Products REST API, based on DRF and Django-RQL. We start with creation of the following models:

We also create DRF serializers for our API:

And we describe our declarative configuration for RQL filters:

We setup settings, views, urls, run migrations, load fixtures and at last run the server. Let's make some queries already!

curl -s "127.0.0.1:8000/api/products/" | jq '. | length'

We will see that there are 3 items in collection. Let's add a filter and get all products, which start with "Connect".

curl -s "127.0.0.1:8000/api/products/?like(name,Connect*)" | jq '.'

You will see the following result (related SQL):

[
  {
    "id": 1,
    "name": "Connect",
    "category": {
      "id": 1,
      "name": "Cloud Applications"
    }
  },
  {
    "id": 2,
    "name": "Connect Zapier Extension",
    "category": {
      "id": 1,
      "name": "Cloud Applications"
    }
  }
]

If we want to include Company of each Product into response, we will make the following query:

curl -s "127.0.0.1:8000/api/products/?like(name,Connect*)&select(company)" | jq '.'

And we get the expected result (related SQL):

[
  {
    "id": 1,
    "name": "Connect",
    "category": {
      "id": 1,
      "name": "Cloud Applications"
    },
    "company": {
      "id": 1,
      "name": "CloudBlue Connect"
    }
  },
  {
    "id": 2,
    "name": "Connect Zapier Extension",
    "category": {
      "id": 1,
      "name": "Cloud Applications"
    },
    "company": {
      "id": 1,
      "name": "CloudBlue Connect"
    }
  }
]

For more examples, please, refer to the library tests or to the provided demo.

OpenAPI 3.0 support

One of the key features of Django-RQL is the support for automatic generation of OpenAPI specification. An example of automatically generated specification can be seen in REST API section of CloudBlue Connect Community portal. Automation is great, but human-written explanations often bring enormous value to customers, so framework allows simple customization of any filter through openapi section in the declarative configuration. Customization example:

{
   'filter': 'published.at',
   'openapi': {
       'required': True,
       'deprecated': True,
       'description': 'Good description',
       'hidden': False,
       'type': 'string',
       'format': 'date',
   }
}

Summary

It is a pretty long article already and I guess it's the perfect time to do a short summary. If you are looking for a powerful easy-to-use API filtering solution for your Django project, I strongly recommend to try Django-RQL. You can easily install it from Pypi (Python 3.6+ is supported) and start using in a matter of minutes:

pip install django-rql

Thank you for you time!


If you have any business or technical questions and suggestions, let's discuss them here, in Github or 🤙 me in LinkedIn.

💖 💪 🙅 🚩
maxipavlovic
Max Kolyubyakin

Posted on June 1, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related