adrian
Posted on October 30, 2023
Suppose in the spirit of haloween, you want to add some haunting to your DWH. How do you do that?
- Create a function that returns some strange messages
- Call it from your pipelines to log the strangeness for lols
- Watch the logs get haunted :)
Sample code below. Happy haunting!
import requests
from bs4 import BeautifulSoup
import random
# Function to fetch spooky quotes from Goodreads
def fetch_spooky_quotes_goodreads():
url = "https://www.goodreads.com/quotes/tag/spooky"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
quotes = soup.find_all('div', class_='quoteText')
for quote in quotes:
yield(quote.text)
def generate_dummy_data(num_rows=1000000):
for i in range(1, num_rows + 1):
data = {
'ID': i,
'Name': f'Name_{i}',
'Age': random.randint(18, 70),
'Gender': random.choice(['Male', 'Female']),
'City': random.choice(['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']),
'Score': round(random.uniform(0, 100), 2)
}
if random.random() < 0.01: # 1% chance to print
print(next(fetch_spooky_quotes_goodreads()))
yield data
# view data
for row in generate_dummy_data():
print(row)
# open connection
pipeline = dlt.pipeline(
destination='duckdb',
dataset_name='raw_data'
)
# Upsert/merge: Update old records, insert new
load_info = pipeline.run(
data,
write_disposition="merge",
primary_key="id",
table_name="users"
)
💖 💪 🙅 🚩
adrian
Posted on October 30, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.