How to use BigQuery Query Caching with Dynamic Wildcard Tables

mesmacosta

Marcelo Costa

Posted on December 29, 2023

How to use BigQuery Query Caching with Dynamic Wildcard Tables

The Problem: Caching does not work with wildcard tables

From BigQuery official docs:

Wildcard Limitation

Let's say you have some tables named my_data_2023_*, where the asterisk represents various months. You want to analyze data across all these tables. Since BigQuery doesn't know automatically when new tables were created, it will invalidate any available cache and run a fresh query, so cache won't be used.

Just for reference, it's not a good practice to use date sharded tables:
Image description
Recently I faced a scenario where tables where dynamically created based on a business domain field, the date example is only for illustration purposes, if you are using sharded tables, the better solution is to migrate it to BigQuery partitions instead.

The Solution: Union THEM ALL!

Enter the BigQuery Information Schema:

The BigQuery INFORMATION_SCHEMA views are read-only, system-defined views that provide metadata information about your BigQuery objects.

Image description

We can use the tables view to dynamically generate a list of all tables matching our pattern (e.g., my_data_2023_*). Then, we leverage UNION to combine individual queries for each identified table.

Here's a sample using Python:

from google.cloud import bigquery

client = bigquery.Client()

# Specify the dataset and wildcard pattern
dataset_id = "your-project.your_dataset"
wildcard_pattern = "my_data_2023_"

# Query the INFORMATION_SCHEMA to get matching table names
query = f"""
    SELECT table_name
    FROM `{dataset_id}.INFORMATION_SCHEMA.TABLES`
    WHERE table_name LIKE '{wildcard_pattern}%'
"""

rows = list(client.query(f"SELECT table_name FROM `{dataset_id}.INFORMATION_SCHEMA.TABLES` "
                                f"where table_name like '{your_table_prefix_}%'"))

if not rows:
    return

view_query = __create_sql(dict(rows[0])["table_name"])
for row in table_names[1:]:
    view_query = f"""
    {view_query}
    UNION ALL
    {__create_sql(dict(row['table_name'])}
"""
Enter fullscreen mode Exit fullscreen mode

I omitted the __create_sql function, which is just a logic that creates a complex SQL based on each table name, with the generated SQL then you can use it to create a BigQuery view:

view = bigquery.Table(table_ref)
view.view_query = view_query
client.create_table(view, exists_ok=True)
Enter fullscreen mode Exit fullscreen mode

Hope that helps, cheers!

💖 💪 🙅 🚩
mesmacosta
Marcelo Costa

Posted on December 29, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related