How materialized views saved our bacon

juliuscc

Julius Colliander Celik

Posted on November 16, 2023

How materialized views saved our bacon

The surprising tale of how a seemingly basic challenge had to be solved with materialized views

My team at Acast recently released a new feature ✨ Availability Calendar ✨ in our self-serve advertising platform. The Availability Calendar exposes which weeks of the year have available slots for host-read sponsorship on our different podcasts. This allows advertisers to better plan their campaigns, resulting in more relevant sponsorship requests – since they are now likely to have a more successful match. It both improves our sales efficiency and the user experience for our advertisers. You can read more about it here.

The biggest challenge when building the Availability Calendar was not functional. We knew how to implement a functional solution. The real challenge was making it fast. We started by building a naive solution, and then worked hard for two weeks to solve the performance issues that we faced. The first version was so slow that we brought down our database (including our two main products) with only eight users testing the new feature.

Let us take a look behind the scenes and explore how we used materialized views to release the Availability Calendar. 👀

Availability Calendar

Availability Calendar in action. Instead of choosing to work with a show the user gets to choose which specific weeks to work with a show.

A little background

Team Mads (pronounced Team Ads) keeps track of every ad campaign booked at Acast in a single database table. Think of it as a treasure trove housing various ad types - from pre-recorded ads to host-read sponsorships and promos for our shows. And as you can imagine, there are a LOT of ads in there. The different types of ads support very different use cases, and to do this we support and store endless combinations of our 33 different targeting options. You could for example target your ad campaign to only appear on comedy shows that are being listened to in Arkansas from mobile devices.

Problem 1 - Inverted Existence Lookup

To figure out if a show has an empty slot for a specific week you have to look at mainly two things:

  • How many sponsorship slots does this show have available on their show?
  • What sponsorship ad campaigns target this show in that specific week and in that specific country? The problem "Is this show free in week 45?" actually requires you to answer "Do none of the campaigns target this show in week 45?". It's not enough to find a specific ad campaign. You have to inspect every single ad campaign to determine that nothing targets the slot. Even after taking some shortcuts the dataset we need to inspect becomes massive. We have to load in the full ad campaign set and inspect every row one by one. Unsurprisingly this is incredibly memory intensive.

Problem 2 - Complicated Targeting

To make matters hairier, inspecting a single ad campaign row is surprisingly costly in itself. Targeting rules can be combined with any combination of OR and AND functionality. If we don't support this we wouldn’t be able to support all the diverse use cases our customers require to build successful advertising campaigns.

We have endlessly complex targeting rule-sets and choosing the storage solution can be difficult. The method we have used so far is using a JSON-column in our Postgres table. It’s worked for half a decade. In most use cases we read the full value or replace it, and don’t do any JSON-parsing of it in our database queries. Therefore we haven’t used Postgres JSON-parsing functionality, until now.

[
  { "type": "listenerCountries", "value": ["US", "CA"] },
  { "type": "showIds", "value": ["1dc963a6-d284-44e0-a083-c6074c2d33d3"] },
  {
    "type": "dateRange",
    "value": { "startAt": "2023-10-26", "endAt": "2023-12-06" }
  }
]

Enter fullscreen mode Exit fullscreen mode

To determine if an ad campaign targets a specific slot we have to run a query that looks a little like this:

-- Get all targeting rules exploded
WITH targeting_rules AS (
  SELECT
    id as campaign_id,
    jsonb_array_elements(targeting) as targeting_rule
  FROM campaign
),

-- Filter on geo
relevant_geo_targeting AS (
    SELECT
        *
    FROM targeting_rules
    WHERE (
        targeting_rule->>'type' = 'listenerCountries'
        AND 'SE' = ANY(
            SELECT *
            FROM jsonb_array_elements_text(targeting_rule->'value')
        )
    )
),

-- Filter on show
relevant_show_targeting AS (
    SELECT
        *
    FROM targeting_rules
    WHERE (
        targeting_rule->>'type' = 'showIds'
        AND '1dc963a6-d284-44e0-a083-c6074c2d33d3' = ANY(
            SELECT *
            FROM jsonb_array_elements_text(targeting_rule->'value')
        )
    )
),

-- Get all campaigns that match both geo and show
SELECT
    *
FROM campaign
WHERE (
    id = ANY(SELECT campaign_id FROM relevant_geo_targeting)
    AND id = ANY(SELECT campaign_id FROM relevant_show_targeting)
);
Enter fullscreen mode Exit fullscreen mode

As you can see, we have to parse the JSON-content of the targeting column for every single campaign before we can determine if they are relevant for our query. We are also exploding the targeting array which results in a vastly larger intermediate table.

Image description

Exploding a table means that you take a list-type column and create a new table with one row per value in the original column. If we explode the campaigns table on targeting the resulting table will be one row per targeting rule.

How bad is it?

The two problems we face are: We have to analyze the full dataset to prove a lack of targeting, AND the analysis we do is very expensive per campaign as we repeatedly perform JSON-parsing. The median response time is around 2 seconds, which is unacceptably slow.

The situation looks more dire when looking at p99 performance though. The p99 performance for fetching only one show's availability fluctuates between 2 seconds and 55 seconds! If you don't know how to read that, it means that at least 1% of the requests take a full minute to complete! That's a whole minute where a database connection is held hostage, unavailable for other users.

The requests aren't just slow. Half a dozen users is enough to bring down our database and all our team's services with it. In other words, it's really bad!

Awful gif

Enter the Hero - Materialized Views

So, what did we do? We zeroed in on a few key insights:

  • The query is costly.
  • JSON-parsing is a culprit in this expense.
  • We must include vast sets of data to confirm an empty slot.
  • The query is repeated often with the same parameters.
  • There are vastly more reads than writes to the campaigns, and the availability query will run often.
  • Data updates need to be somewhat live. At most, updates can be delayed by a couple of minutes.
  • The data lives in Postgres and we want to avoid any major changes to the underlying data storage.

The obvious answer for us was to cache or precompute the results. We solved this by using materialized views. Materialized views is a Postgres feature where you give Postgres a query and Postgres will persist the result of the query in a table. And whenever you want to refresh the table you just run this:

REFRESH MATERIALIZED VIEW [CONCURRENTLY] campaign_targeting_rule_view;
Enter fullscreen mode Exit fullscreen mode

We created two materialized views that together make up the majority of workload for the original query. One of the views (campaign_by_show) explodes the campaigns into one row per "campaign and show targeted". The show id is stored in a native Postgres TEXT column which makes it much faster than parsing the targeting JSON-object. All JSON-parsing is, therefore, removed from the reading queries. This view makes it really fast to filter out campaigns based on whether they target any shows and which shows they target.

The other view (country_targeting_view) precomputes one row per country and campaign so that we can quickly filter out campaigns by country. This view uses campaign_targeting_view which makes the refresh much faster (but still slow enough that we want to precompute the results).

We use a lambda that runs every 10 minutes which refreshes the views, meaning that these views are being updated every 10 minutes. Any new campaign or campaign update will be represented in the Availability Calendar within 10 minutes at most. We use a concurrent refresh which means that no reads to the views are blocked by the updates. In other words, it doesn't matter if the update takes 1, 10 or, 100 seconds for the consumers of this table as the reads are unaffected by the view refreshes.

From Snail Pace to Supersonic Speed

Okay, so did the materialized views improve anything? If so, how much better is it? Well I will let this graph speak for itself:

Image description

Median response time went from 2000ms to 50ms! 🎉

The median response time saw a colossal transformation, from a sluggish 2000ms to a lightning-fast 50ms. That's a 97.5% load off our shoulders, without changing any of the underlying data storage, or crafting a bespoke caching solution.

If you are in a similar pickle, wrestling with a sluggish SQL query, and you're not keen on system overhauls, consider the magic of materialized views for precomputation of the expensive parts. And if you do, please leave a comment letting us know how it went.

💖 💪 🙅 🚩
juliuscc
Julius Colliander Celik

Posted on November 16, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

How materialized views saved our bacon
postgres How materialized views saved our bacon

November 16, 2023