Django Rest API with Elastic Search
Md. Mahmudul Huq
Posted on September 18, 2022
Before I let you know how to implement elastic search rest api using django, please read my previous article to install it onto your machine first. https://mahmudtopu3.medium.com/play-with-elastic-search-with-python-django-on-ubuntu-part-1-d063af7edc00
Why we need this? Typical filters in django use sql like operators to search for terms. It does work if the keyword is exact. But for long sentences or misspelled keywords it fails to return any results. Postgres’s Trigram Search or Django’s Search Vector works too some extent but very slow. For search engine and analytical engine Elasticsearch is the King.
In this tutorial we will learn how to implement fuzzy search, object field and nested field of Elasticsearch, ManyToMany field, integrate Django Rest API, Dynamic Q objects and most importantly we will not be using too many third party packages for this. This is just a beginning. I struggled a lot. Had to go through numerous articles, videos but combing things together is the motivation of this article. Please follow me.
Here is the github repo. Clone it. Follow the instruction to run on your system. After completing step 9 you can test. However I will explain the code now.
Before I get started Here is a simple snapshot of our data model. Employee has personal info, as well we company(foreignkey) in which he/she works, multiple courses (ManyToMany), and skills with level. (separate skills model). Check out the file https://github.com/mahmudtopu3/django_elastic/blob/master/hrm/models.py
We want to search by company name, profession, by courses as well skill with levels. Now to implement elastic search we need to create a document class which will be responsible for everything.
analyzers.py: Basically we need to use couple of filters of Elasticsearch so that we created a object that will load the analyzer depending on the version.
from elasticsearch_dsl import analyzer
from elasticsearch_dsl import __version__
__all__ = (
'html_strip',
)
# The ``standard`` filter has been removed in Elasticsearch 7.x.
if __version__[0]>=7:
_filters = ["lowercase", "stop", "snowball"]
else:
_filters = ["standard", "lowercase", "stop", "snowball"]
html_strip = analyzer(
'html_strip',
tokenizer="standard",
filter=_filters,
char_filter=["html_strip"]
)
Now the main part, the document class. As we know Elasticsearch has couple of field types such as TextField, IntegerField, ObjectField, NestedField, StringField etc. Whatever we do Elasticsearch stores data into a json object. This class is a blueprint of the data model.
Here our class HRMDocument has a @registry.register_document decorator which has auto signals to update and delete, index data means sync elasticsearch node with the django database.
from django_elasticsearch_dsl import Document, fields
from django_elasticsearch_dsl.registries import registry
from .analyzers import html_strip
from hrm.models import (
Company, Courses, Employee, Skills
)
@registry.register_document
class HRMDocument(Document):
class Index:
name = 'hrm_index'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0,
}
name = fields.TextField(
attr='name',
fields={
'raw': fields.KeywordField(),
'suggest': fields.Completion(),
}
)
company = fields.ObjectField(
properties={
'name': fields.TextField(),
'country': fields.TextField(),
}
)
current_job = fields.TextField(attr='current_job')
year_of_experience = fields.FloatField()
courses = fields.TextField(
attr='courses_indexing',
analyzer=html_strip,
fields={
'raw': fields.KeywordField(multi=True),
'suggest': fields.CompletionField(multi=True),
},
multi=True
)
skills = fields.NestedField(
attr='skills_indexing',
properties={
'id': fields.IntegerField(),
'name': fields.TextField(),
'level': fields.TextField(
analyzer=html_strip,
fields={
'raw': fields.KeywordField(),
},
),
},
)
class Django:
model = Employee
fields = [
'id',
]
related_models = [Company]
def get_queryset(self):
return super().get_queryset().select_related(
'company'
)
def get_instances_from_related(self, related_instance):
if isinstance(related_instance, Company):
return related_instance.employees.all()
elif isinstance(related_instance, Skills):
return related_instance.skills
We name our index as hrm_index with 1 shard and 0 replicas as we will run it on a single node.
Here we have created fields like this
name = fields.TextField(
attr='name',
fields={
'raw': fields.KeywordField(),
'suggest': fields.Completion(),
}
)
we create an object of TextField class which comes from django_elasticsearch_dsl package. Here attr is the model field name or any
model property, fields is the mapping structure where we can have raw and suggest field for autosuggestions in case of TextField.
Here is Employee Model company is a foreignkey so we store it as a objectfield of elastic search.
company = fields.ObjectField(
properties={
'name': fields.TextField(),
'country': fields.TextField(),
}
)
As you can see we inherited the Company models into our Employee Model
so our elastic documents field is an ObjectField where properties will be the fields of Company Model. Each property can have any type of fields. As deep as you want after all it will be converted into a json object.
Here we will store ManyToMany fields as StringField (comma separated).
courses = fields.TextField(
attr='courses_indexing',
analyzer=html_strip,
fields={
'raw': fields.KeywordField(multi=True),
'suggest': fields.CompletionField(multi=True),
},
multi=True
)
A lot of things there? Here we use our analyzer html_strip. Two fields for keyword search and autosuggest. Here attr is courses_indexing which is a property of Employee Model that returns course name list. Here multi is True because values will be multiple.
hrm/models.py
@property
def courses_indexing(self):
"""
skills for indexing.
Used in Elasticsearch indexing.
"""
return [course.name for course in self.courses.all()]
Now we want to store our Skills as a NestedField but why?
Elasticsearch with ObjectField will flattened the data if it is multiple which will return irrelevant results. So we use nested field which is stored into another index with reference. Follow this article https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/
skills = fields.NestedField(
attr='skills_indexing',
properties={
'id': fields.IntegerField(),
'name': fields.TextField(),
'level': fields.TextField(
analyzer=html_strip,
fields={
'raw': fields.KeywordField(),
},
),
},
)
Although we don’t need html_strip here. I forgot to update the repo.
skills_indexing is a model property in Employee Model which returns the related skill objects.
In documents.py you can see a subclass Django
class Django:
model = Employee
fields = [
'id',
]
related_models = [Company]
def get_queryset(self):
return super().get_queryset().select_related(
'company'
)
def get_instances_from_related(self, related_instance):
if isinstance(related_instance, Company):
return related_instance.employees.all()
elif isinstance(related_instance, Skills):
return related_instance.skills
Here model is the model class in our case it is Employee. Then field list will have the fields. If we define field here instead of modification, it will be automatically converted. related_models list contains ForeinKey models.
get_queryset and get_instances_from_related functions will fetch related data with better efficiency.
We added two singals of Skills model to update or delete index.
@receiver(post_save, sender=Skills)
def update_skills(sender, instance, created, **kwargs):
registry.update(instance.employee)
@receiver(post_delete, sender=Skills)
def delete_skills(sender, instance, using, **kwargs):
registry.update(instance.employee)
We need a serializer for the rest api. Checkout simple model serializer at serializers.py with depth 1.
import operator
from functools import reduce
from django.http import HttpResponse
from rest_framework import viewsets
from rest_framework.pagination import PageNumberPagination
from rest_framework.views import APIView
from elasticsearch_dsl import Q as QQ
from hrm.documents import HRMDocument
from hrm.models import Employee
from hrm.serializers import EmployeeSerializer
class EmployeeElasticSearchAPIView(APIView, PageNumberPagination):
serializer_class = EmployeeSerializer
document_class = HRMDocument
def get(self, request):
# print(request.META['QUERY_STRING'])
try:
finalquery = []
q = request.GET.get('search', None)
company = request.GET.get('company', None)
courses = request.GET.get('courses', None)
skills = request.GET.get('skills', None)
level = request.GET.get('level', None)
exp_gte = request.GET.get('exp_gte', None)
exp_lte = request.GET.get('exp_lte', None)
if q is not None and not q == '':
finalquery.append(QQ(
'multi_match',
query=q,
fields=[
'name',
'current_job',
],
fuzziness='auto'))
if company is not None and not company == '':
finalquery.append(QQ(
'match_phrase',
company__name=company,
))
if courses is not None and not courses == '':
finalquery.append(
QQ(
'multi_match',
query=courses,
fields=[
'courses',
],
fuzziness='auto'),
)
if skills is not None and not skills == '':
finalquery.append(
QQ(
'nested',
path="skills",
query=QQ("match_phrase", skills__name=skills.lower()),)
)
if len(finalquery) > 0:
response = self.document_class.search().extra(size=10000).query(
reduce(operator.iand, finalquery)).to_queryset()
if exp_gte is not None and not exp_gte == '' and exp_lte is not None and not exp_lte == '':
response = response.filter(
year_of_experience__gte=exp_gte, year_of_experience__lte=exp_lte)
if skills is not None and not skills == '' and level is not None and not level == '':
response = response.filter(
skills__name__icontains=skills, skills__level=level)
print(response)
else:
response = Employee.objects.all().order_by('-id')
results = self.paginate_queryset(response, request, view=self)
serializer = self.serializer_class(results, many=True)
return self.get_paginated_response(serializer.data)
except Exception as e:
return HttpResponse(e, status=500)
Here most of the things are related to Django and Django Rest Framework. I assume you know these well( APIView, Pagination etc.).
elasticsearch_dsl provides Q function to create Q lookups queries. We used alias (QQ) to avoid name conflicts with Django Q with Elasticsearch DSL Q.
We need to create an object of documents class in line 16 by which our filter and queries will work.
We can have many parameters so we will combine each query params into a single Q lookup. For that we use reduce function.
Let’s see one for search param
if q is not None and not q == '':
finalquery.append(QQ(
'multi_match',
query=q,
fields=[
'name',
'current_job',
],
fuzziness='auto')
)
Here we appended a QQ lookup with first parameter is the elastic search filter i.e multi_match, then query=q is assignment of the q variable, fields list means the fields in which the filter will work, fuzzyness is auto which will return results if token is matched.(wrong spelling or similar spelled)
if skills is not None and not skills == '':
finalquery.append(
QQ(
'nested',
path="skills",
query=QQ("match_phrase", skills__name=skills.lower()),) )
For nested field, we need nested query. so we use term “nested”, as well as the path which is the skills in our document at line 44.
query will take a QQ lookup query. we will match the phrase of nested field. We will use __ to map the object depth field.
In line 67 we check our finalquery list of all QQ lookups length.
Then we query using the combined lookups using reduce function with and operator in line 69
response = self.document_class.search().extra(size=10000).query( reduce(operator.iand, finalquery)).to_queryset()
For experience range and skill level we can use normal django filter as in line 69 the data is search from elastic search. then gets mapped into django queryset.
Finally we return paginated data in a normal way.
We dynamically mapped query params into a combined lookup.
We can do a lot of things in a lot of ways. I will update the repo and article time to time.
Here are some queries you can do.
http://127.0.0.1:8000/hrm/all-employees?courses=Databas
http://127.0.0.1:8000/hrm/all-employees?search=engineer&skills=elastic&level=INTERMEDIATE
http://127.0.0.1:8000/hrm/all-employees?search=developer&skills=python
Also try these queries
http://127.0.0.1:8000/hrm/all-employees?search=software&company=Daffodil%20Family
http://127.0.0.1:8000/hrm/all-employees?search=softwae
http://127.0.0.1:8000/hrm/all-employees?search=software&courses=Business%20Communication
http://127.0.0.1:8000/hrm/all-employees?search=software&courses=Database
http://127.0.0.1:8000/hrm/all-employees?search=software&company=Daffodil%20Family&courses=Database
http://127.0.0.1:8000/hrm/all-employees?search=softwre&courses=Complr
http://127.0.0.1:8000/hrm/all-employees?search=softwre&courses=Dataase
http://127.0.0.1:8000/hrm/all-employees?search=softwre&skills=Elastic%20Search
http://127.0.0.1:8000/hrm/all-employees?search=engineer&skills=python
http://127.0.0.1:8000/hrm/all-employees?search=developer&skills=python
http://127.0.0.1:8000/hrm/all-employees?courses=Accounting
http://127.0.0.1:8000/hrm/all-employees?search=software&exp_gte=3&exp_lte=5
http://127.0.0.1:8000/hrm/all-employees?search=software&exp_gte=4.1&exp_lte=5
http://127.0.0.1:8000/hrm/all-employees?search=engineer&skills=java
http://127.0.0.1:8000/hrm/all-employees?search=engineer&skills=java&level=INTERMEDIATE
http://127.0.0.1:8000/hrm/all-employees?search=engineer&skills=elastic&level=INTERMEDIATE
If you find this article helpful, then please give a thumbs up.
Follow my LinkedIn
https://www.linkedin.com/in/md-mahmudul-huq/
Posted on September 18, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.