Integrating Django with PostgreSQL Materialized Views

4bdulll4

Abdulla

Posted on June 1, 2023

Integrating Django with PostgreSQL Materialized Views

Django is one of the most popular web frameworks that is written in Python. It follows the principle of Don’t Repeat Yourself (DRY) and promotes rapid development by providing a set of tools, libraries, and conventions that handle common web development tasks.

What makes Django powerful is that it has built-in support for ORM that is developed and maintained by Django developers. The data model in Django is represented as Python classes in which the database tables will be designed and queried without the usage of raw SQL queries.

In this article, we will explore how to integrate Django with PostgreSQL materialized views as we will customize Django’s ORM in a way to fully support materialized views by defining materialized view models in the Django project such that the model changes could be detected by Django’s migration system.

Table of Contents

What are Materialized Views?

Views in database engines are basically virtual tables that provide a way to query and retrieve data from existing tables. They allow us to simplify complex queries. In addition, since views dynamically execute the primary query each time they have been accessed, they sometimes come with a performance tradeoff.

This is where materialized views come into play. Materialized views on the other hand are similar to regular views. The only difference they have with views is that they store the results of the underlying query in a physical table. In other words, materialized views compile the result of the query and persist them as a separate entity in the database. As a result, accessing materialized views are way faster than re-computing the query each time they are accessed.

However, materialized views also come with some drawbacks that need to be considered during development such as the following:

  1. Data in materialized views may become outdated if not refreshed regularly, potentially leading to inconsistencies.
  2. Storage requirements need to be considered as materialized views consume disk space to store their data.
  3. Adds complexity to the system as it needs maintenance when the underlying tables where updated.

Creating Materialized View in PostgreSQL

A materialized view can be created using the following SQL command:

CREATE MATERIALIZED VIEW popular_posts AS  
SELECT * FROM posts WHERE rating > 200;
Enter fullscreen mode Exit fullscreen mode

This query will create a new materialized view from the underlying query where the result is cached. Furthermore, the updated data in the original table will not automatically be accessible in the materialized view as it needs to be manually refreshed using the following command:

REFRESH MATERIALIZED VIEW popular_posts;

This command will refresh the materialized view and populate the latest data of the underlying table as it will recompile the underlying query definition of the materialized view.

The Problem

As currently Django’s ORM only supports database tables, I started wondering if it is possible to customize Django’s ORM to support materialized views. After conducting a research, I found out that there are a few ways to achieve this but I wasn’t satisfied with the solutions as none of the customizations could fully support materialized views to be used as Django models.

One of the solutions involved manually creating an empty migration file and specifying the SQL query for creating the materialized view using the RunSQL command.

from django.db import migrations  


class Migration(migrations.Migration):  

    dependencies = [  
        ("api", "0001_initial"),  
    ]  

    operations = [  
        migrations.RunSQL(  
            sql="CREATE MATERIALIZED VIEW popular_posts AS SELECT * FROM posts WHERE rating > 200",  
            reverse_sql="DROP MATERIALIZED VIEW popular_posts"  
        )  
    ]
Enter fullscreen mode Exit fullscreen mode

Additionally, a model class needed to be created, defining the fields exactly the same as the materialized view query. The model class has to have the managed option set to false in its meta class definition, indicating that the database schema should not be managed by Django in order to prevent the migration system to create a new table and the db_table option explicitly set to match the name of the materialized view.

class PopularPosts(models.Model):  
    title = models.CharField(max_length=200)  
    content = models.TextField()  
    author = models.ForeignKey(User, on_delete=models.CASCADE)  
    tag_names = models.CharField(max_length=200)  
    created_at = models.DateTimeField(auto_now_add=True)  
    updated_at = models.DateTimeField(auto_now=True)  
    rating = models.PositiveIntegerField()  

    class Meta:  
        managed = False  
        db_table = 'popular_posts'
Enter fullscreen mode Exit fullscreen mode

In conclusion, this approach in my opinion is not considered an optimal solution and efficient in the long run, especially when dealing with frequent updates or changes of the model class attributes. Each time a change in model class attributes is required, the new raw query needs to be rewritten in a new migration file which can be time-consuming and prone to errors. Moreover, this can become a cumbersome task, especially as the application grows and evolves over time.

Integrating Materialized Views with Django

In order to be able to integrate Django with materialized views, a few customizations will be applied on our Django example project including:

  1. Creating custom model class
  2. Creating custom model field class
  3. Creating custom database engine for PostgreSQL
  4. Wrapping up all together with a sample project

Step 1: Create a Custom Model Class

The first step to integrating Django with materialized views in PostgreSQL is to create a custom model class to inform Django that the model is intended to be a materialized view.

There are multiple ways to tell if a model class is materialized view, I designed the model class in a way to accept two custom class attributes inside the Meta class of the model class that are materialized_view and view_parent_model

  • materialized_view: Indicates that the model class is a materialized view, not a table.
  • view_parent_model: Instructs the model class which actual model class should be used to generate the underlying query for the materialized view.
class MaterializedViewModel(models.Model):  

   class Meta:  
      materialized_view = True  
      view_parent_model = 'app_label.Model'
Enter fullscreen mode Exit fullscreen mode

Django usually does not allow defining custom attributes on its Meta class inside the model classes as it will raise a TypeError exception:


raise TypeError(  
  TypeError: 'class Meta' got invalid attribute(s): materialized_view

Enter fullscreen mode Exit fullscreen mode

After conducting research, I discovered that Django only allows Meta attributes that are statically defined within the DEFAULT_NAMES tuple which lives in the django.db.models.options module. To address this limitation, I implemented a workaround by importing the options module and overriding the DEFAULT_NAMES variable inside the __init__.py file of my project module before populating Django apps. This modification enables support for custom attributes in the Meta class.

# djangoProject/__init__.py  

import django.db.models.options as options  

options.DEFAULT_NAMES += ('materialized_view', 'view_parent_model',)
Enter fullscreen mode Exit fullscreen mode

Step 2: Create a Custom Model Field

The next step is to create a custom model field specifically designed for materialized view model class that will have two custom attributes. Furthermore, each of the custom fields specified within the model class will subsequently be transformed into raw SQL queries to generate the SQL required for the creation of the materialized view.

from django.db.models import fields  
from django.db.models.expressions import Combinable, ExpressionWrapper, F  


class MaterializedViewField(fields.Field):  

    def __init__(self, child, source=None, **kwargs):  
        super().__init__(**kwargs)  
        self.child = child  

        if isinstance(source, Combinable) or source is None:  
            self.source = source  

        elif isinstance(source, str):  
            self.source = ExpressionWrapper(F(source), output_field=child)  

        else:  
            self.source = None  

    def deconstruct(self):  
        """  
            Overriding the deconstruct method to include the custom field attributes in   
            migration files while executing `makemigrations` command on the materialized view model.  
        """  
        name, path, args, keywords = super().deconstruct()  

        keywords.update(  
            source=self.source, child=self.child  
        )  

        return name, path, args, keywords
Enter fullscreen mode Exit fullscreen mode

Step 3: Create a Custom Database Engine for PostgreSQL

The first step to integrating Django with materialized views in PostgreSQL is to create a custom database editor. This engine is responsible for implementing the necessary schema customization.

In order to customize Django’s database schema editor, a new database engine needs to be defined in the project and referenced in ENGINE attribute of DATABASES entry in the project’s settings file.

Note that the custom database engine is required to be placed in a directory with a file named base.py and having a class named DatabaseWrapper

Refer to Django’s documentation for more information.

The following is a sample of my folder structure where I placed the database engine in a Django app named core and placed the DB engine in the backends subfolder.

project_root/  
   ...  
   django_project/  
   __init__.py  
   settings.py  

   core/  
      __init__.py  
      backends/  
         __init__.py  
         db/  
            __init__.py  
            base.py
Enter fullscreen mode Exit fullscreen mode

The following code is an example of a custom database engine where it is located in base.py file.

from django.apps import apps  
from django.db.backends.postgresql import base  
from django.db.models import QuerySet, options, Model  

from api.fields import MaterializedViewField  


class DatabaseSchemaEditor(base.DatabaseSchemaEditor):  
    sql_create_materialized_view = "CREATE MATERIALIZED VIEW %(table)s AS %(definition)s"  
    sql_delete_materialized_view = "DROP MATERIALIZED VIEW %(table)s"  
    sql_refresh_materialized_View = "REFRESH MATERIALIZED VIEW %(concurrently)s %(view)s"  

 @staticmethod  
    def model_meta(model: type[Model]) -> options.Options:  
        return model._meta  

    def _get_parent_model(self, model: type[Model]):  
        """  
            Returns the underlying view model that will be used to generate materialized view's SQL.  
        """  
        parent_model = getattr(self.model_meta(model), 'view_parent_model', None)  

        if parent_model:  
            return apps.get_model(*parent_model.split('.'))  

    def model_is_materialized_view(self, model: type[Model]) -> bool:  
        """Checks if the model class is a materialized view model or a regular django model."""  
        return getattr(self.model_meta(model), 'materialized_view', False)  

    def get_queryset(self, model: Model, extra_field=None):  
        """Generates the queryset out of the provided parent model and the provided fields."""  

        def append_field(_model_field):  

            if _model_field.source is None:  
                concrete_fields.append(_model_field.name)  
            else:  
                annotation_fields.update({_model_field.attname: _model_field.source})  

        concrete_fields = []  
        annotation_fields = dict()  

        for field_name, field in model.__dict__.items():  
            if hasattr(field, 'field'):  
                model_field: MaterializedViewField = field.field  

                if isinstance(model_field, MaterializedViewField):  
                    append_field(model_field)  

        if extra_field:  
            append_field(extra_field)  

        return QuerySet(  
            model=self._get_parent_model(model)  
        ).only(*concrete_fields).annotate(**annotation_fields).query  

    def create_model(self, model, extra_field=None):  
        if self.model_is_materialized_view(model):  
            sql = self.sql_create_materialized_view % {  
                'table': self.quote_name(self.model_meta(model).db_table),  
                'definition': self.get_queryset(model, extra_field=extra_field)  
            }  
            self.execute(sql)  
        else:  
            super().create_model(model)  

    def add_field(self, model: Model, field):  

        if self.model_is_materialized_view(model):  
            setattr(model, field.attname, field)  
            self.delete_model(model)  
            self.create_model(model, extra_field=field)  

        else:  
            super().add_field(model, field)  

    def remove_field(self, model, field):  

        if self.model_is_materialized_view(model):  
            delattr(model, field.attname)  
            self.delete_model(model)  
            self.create_model(model)  
        else:  
            super().remove_field(model, field)  

    def alter_field(self, model, old_field, new_field, strict=False):  

        if self.model_is_materialized_view(model):  
            delattr(model, old_field.attname)  
            self.delete_model(model)  
            self.create_model(model, extra_field=new_field)  

        else:  
            super().alter_field(model, old_field, new_field, strict)  

    def delete_model(self, model):  
        if self.model_is_materialized_view(model):  
            self.execute(  
                self.sql_delete_materialized_view % {  
                    "table": self.model_meta(model).db_table  
                }  
            )  
        else:  
            super().delete_model(model)  

    def refresh_materialized_view(self, model: type[Model], concurrent=False):  
        """  
            Performs materialized view refresh query if it was desired to  
            populate the view data on demand.  
        """  
        self.execute(self.sql_refresh_materialized_View % {  
            'view': model._meta.db_table,  
            'concurrently': 'CONCURRENTLY' if concurrent else ''  
        })  


class DatabaseWrapper(base.DatabaseWrapper):  
    SchemaEditorClass = DatabaseSchemaEditor 
Enter fullscreen mode Exit fullscreen mode

The above code will override Django’s built-in postgres database schema class to support executing queries that are necessary to create materialized views once migrate command has been executed.

In short, all the database operation methods have been modified to include a check to determine whether the model is a materialized view model or a regular model class. If it is a materialized view model, the get_queryset method will be invoked to generate the raw SQL query from the parent model for creating or updating the materialized view. Otherwise, the parent method will be called as usual.

Finally, Add the custom database engine to the project’s database settings as the following:

DATABASES = {  
    'default': {  
       'ENGINE': 'core.backends.db',  
       'HOST': '<db-host>',  
       'NAME': '<db-name>',  
       'USER': '<user>',  
       'PASSWORD': '<passwrod>',  
       'PORT': 5432,  
       'ATOMIC_REQUESTS': True,  
    }  
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Putting it all together

Now that we have explained how to integrate materialized view with Django, let us demonstrate an example usage with a few model samples. The model definition is basically a simple post model with many-to-many relation with comments and tags models as shown in the example below:

class Tag(models.Model):  
    name = models.CharField(max_length=100)  

    def __str__(self):  
        return self.name  

class Comment(models.Model):  
    post = models.ForeignKey(Post, on_delete=models.CASCADE, related_name='comments')  
    author = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)  
    content = models.TextField()  
    created_at = models.DateTimeField(auto_now_add=True)  

    def __str__(self):  
        return f'Comment by {self.author.username} on {self.post.title}'  


class Post(models.Model):  
    title = models.CharField(max_length=200)  
    content = models.TextField()  
    author = models.ForeignKey(User, on_delete=models.CASCADE)  
    tags = models.ManyToManyField(Tag)  
    created_at = models.DateTimeField(auto_now_add=True)  
    updated_at = models.DateTimeField(auto_now=True)  

    def __str__(self):  
        return self.title
Enter fullscreen mode Exit fullscreen mode

Next, the models need to be migrated by using makemigrations and migrate commands sequentially.

$ python manage.py makemigraitons  
$ python manage.py migrate
Enter fullscreen mode Exit fullscreen mode

Once all the models have been migrated to the database, we need to create the materialized view model definition for the Post model.

class MaterializedViewBaseModel(models.Model):  

    class Meta:  
        abstract = True  

    @classmethod  
    def check(cls, **kwargs):  
        errors = super().check(**kwargs)  

        if not hasattr(cls._meta, 'materialized_view'):  
            errors.append(  
                checks.Error(  
                    'The `view` attribute is required in materialized view meta class.',  
                    obj=cls,  
                    id='models.E100',  
                )  
            )  

        view_parent_model = getattr(cls._meta, 'view_parent_model', None)  

        if view_parent_model:  
            try:  
                apps.get_model(*getattr(cls._meta, 'view_parent_model').split('.'))  
            except (LookupError, ValueError) as e:  
                errors.append(  
                    checks.Error(  
                        f"Invalid `view_parent_model` format, {e}", obj=cls, id='models.E101'  
                    )  
                )  
        else:  
            errors.append(  
                checks.Error(  
                    'The `view_parent_model` attribute is required in materialized view meta class.',  
                    obj=cls,  
                    id='models.E101'  
                )  
            )  

        return errors  

    @classmethod  
    def refresh(cls, concurrent=False):  
        with connection.cursor() as cursor:  
            editor = cursor.db.schema_editor()  
            editor.refresh_materialized_view(cls, concurrent=concurrent)  


class PostMaterializedView(MaterializedViewBaseModel):  

    post_id = MaterializedViewField(source='pk', child=models.IntegerField())  
    title = MaterializedViewField(child=models.CharField())  
    content = MaterializedViewField(child=models.CharField())  
    tag_names = MaterializedViewField(  
        source=aggregates.StringAgg('tags__name', delimiter="'; '", distinct=True),  
        child=models.CharField()  
    )  
    comments_count = MaterializedViewField(  
        source=functions.Coalesce(models.Count('comments', distinct=True), 0),  
        child=models.IntegerField()  
    )  
    comment_authors = MaterializedViewField(  
        source=aggregates.StringAgg(  
            'comments__author__first_name', delimiter="', '", distinct=True  
        ),  
        child=models.CharField()  
    )  

    class Meta:  
        view_parent_model = 'api.Post'  
        materialized_view = True  
        constraints = [  
            models.UniqueConstraint(  
                models.F('post_id'), name='unique_post_id',  
            )  
        ]
Enter fullscreen mode Exit fullscreen mode

In the above example, the custom Options of Meta class has been defined to inform the custom database schema editor that the model is a materialized view and generate the SQL query from Post model.

The model field MaterializedViewField has been used to define the model fields providing the child model field and source attribute. It is important to note that the model field has been intentionally designed to support various database expressions mentioned in Django’s query expression documentation including F, StringAggr, Count, Avg, Sum, Case and so on.

Additionally, The materialized abstract class will basically run some checks on the custom Meta class attributes such as validating the existence of both materialized_view and view_parent_model attributes and making sure that the view_parent_model has been defined with the correct format.

Finally, executing the makemigrations and migrate commands sequentially should create a new materialized view that could be easily used in Django to perform query and filter operations.

# Generated by Django 4.2.1 on 2023-05-22 22:10  

import api.fields  
import django.contrib.postgres.aggregates.general  
from django.db import migrations, models  
import django.db.models.aggregates  
import django.db.models.functions.comparison  


class Migration(migrations.Migration):  

    dependencies = [  
        ("api", "0001_initial"),  
    ]  

    operations = [  
        migrations.CreateModel(  
            name="PostMaterializedView",  
            fields=[  
                (  
                    "id",  
                    models.BigAutoField(  
                        auto_created=True,  
                        primary_key=True,  
                        serialize=False,  
                        verbose_name="ID",  
                    ),  
                ),  
                (  
                    "post_id",  
                    api.fields.MaterializedViewField(  
                        child=models.IntegerField(),  
                        source=models.ExpressionWrapper(  
                            models.F("pk"), output_field=models.IntegerField()  
                        ),  
                    ),  
                ),  
                (  
                    "title",  
                    api.fields.MaterializedViewField(  
                        child=models.CharField(), source=None  
                    ),  
                ),  
                (  
                    "content",  
                    api.fields.MaterializedViewField(  
                        child=models.CharField(), source=None  
                    ),  
                ),  
                (  
                    "tag_names",  
                    api.fields.MaterializedViewField(  
                        child=models.CharField(),  
                        source=django.contrib.postgres.aggregates.general.StringAgg(  
                            "tags__name", delimiter="'; '", distinct=True  
                        ),  
                    ),  
                ),  
                (  
                    "comments_count",  
                    api.fields.MaterializedViewField(  
                        child=models.IntegerField(),  
                        source=django.db.models.functions.comparison.Coalesce(  
                            django.db.models.aggregates.Count(  
                                "comments", distinct=True  
                            ),  
                            0,  
                        ),  
                    ),  
                ),  
                (  
                    "comment_authors",  
                    api.fields.MaterializedViewField(  
                        child=models.CharField(),  
                        source=django.contrib.postgres.aggregates.general.StringAgg(  
                            "comments__author__first_name",  
                            delimiter="', '",  
                            distinct=True,  
                        ),  
                    ),  
                ),  
            ],  
            options={  
                "materialized_view": True,  
                "view_parent_model": "api.Post",  
            },  
        ),  
        migrations.AddConstraint(  
            model_name="postmaterializedview",  
            constraint=models.UniqueConstraint(  
                models.F("post_id"), name="unique_post_id"  
            ),  
        ),  
    ]
Enter fullscreen mode Exit fullscreen mode

Note that both source and child attributes of MaterializedViewField have been included in the migration file as we defined them in the field’s deconstruct method.

This is it, we have now successfully integrated Django with PostgreSQL’s Materialized Views.

Next Steps

Although we have managed to introduce a new feature in Django to make use of PostgreSQL’s materialized view, there are a few improvements that we should consider.

  1. As it is clear now that materialized views need regular updates in case the underlying view table has been updated, a mechanism must be designed to keep the materialized view data updated.
  2. More complex examples could be tested on the proposed solution to check its dynamicity for various kinds of queries and use cases.

Conclusion

In conclusion, this article has presented an extensive guide on integrating Django with PostgreSQL’s materialized views. By introducing mechanisms such as overriding the database schema class, designing custom model classes, and designing custom model fields, we have demonstrated how to easily integrate materialized views into Django applications.

With the ability to create and update materialized views using Django’s migration system, we can maintain synchronization between the database schema and model definitions. This separation of regular models from materialized view models enhances code organization and maintainability.

💖 💪 🙅 🚩
4bdulll4
Abdulla

Posted on June 1, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related