Haunted data pipeline

dlt-library

adrian

Posted on October 30, 2023

Haunted data pipeline

Suppose in the spirit of haloween, you want to add some haunting to your DWH. How do you do that?

  1. Create a function that returns some strange messages
  2. Call it from your pipelines to log the strangeness for lols
  3. Watch the logs get haunted :)

Sample code below. Happy haunting!

import requests
from bs4 import BeautifulSoup

import random


# Function to fetch spooky quotes from Goodreads
def fetch_spooky_quotes_goodreads():
    url = "https://www.goodreads.com/quotes/tag/spooky"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    quotes = soup.find_all('div', class_='quoteText')

    for quote in quotes:
        yield(quote.text)



def generate_dummy_data(num_rows=1000000):
    for i in range(1, num_rows + 1):
        data = {
            'ID': i,
            'Name': f'Name_{i}',
            'Age': random.randint(18, 70),
            'Gender': random.choice(['Male', 'Female']),
            'City': random.choice(['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']),
            'Score': round(random.uniform(0, 100), 2)
        }
        if random.random() < 0.01: # 1% chance to print
            print(next(fetch_spooky_quotes_goodreads()))
        yield data



# view data
for row in generate_dummy_data():
    print(row)

# open connection
pipeline = dlt.pipeline(
    destination='duckdb',
    dataset_name='raw_data'
)

# Upsert/merge: Update old records, insert new
load_info = pipeline.run(
    data,
    write_disposition="merge",
    primary_key="id",
    table_name="users"
)

Enter fullscreen mode Exit fullscreen mode
💖 💪 🙅 🚩
dlt-library
adrian

Posted on October 30, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Haunted data pipeline
python Haunted data pipeline

October 30, 2023