Create a DBT w/Postgres

obrotoks

Obrotoks

Posted on June 1, 2023

Create a DBT w/Postgres

Goal

Create a docker with postges. Then create another docker with dbt and launch an operation with dbt. Source code

Requirements

  • Docker
  • Postgres
  • Python (with DBT)

1 - Create a Server with Postgres

I've already create a post and how it could be done.

2 - Create a Server with Python

For this part, I've used this post to help about how to construct a python server with dbt.

However, there are a difference between these article, we want to launch this task only one time.

2.1 - Python Requirements

For this purpose, we will need this requirements of python:

dbt-postgres==1.0.0
markupsafe==2.0.1
Enter fullscreen mode Exit fullscreen mode

DBT library to make the ETL process & markupsafe to avoid problems with the config profile yml

2.2 - Dockerfile

We should create this docker file to try to create our server to launch our DBT process


FROM python:3.8.1-slim-buster

# Working directory
WORKDIR /app

# Install OS dependencies
RUN apt-get update && apt-get install -qq -y \
    git gcc build-essential libpq-dev --fix-missing --no-install-recommends \ 
    && apt-get clean

# Make sure we are using latest pip
RUN pip install --upgrade pip

# Create directory for dbt config
RUN mkdir -p /root/.dbt

# Copy requirements.txt
COPY requirements.txt requirements.txt

# Install dependencies
RUN pip install -r requirements.txt

# Copy dbt profile
COPY profiles.yml /root/.dbt/profiles.yml


# Copy source code
COPY dbt_post/ /app

WORKDIR /app/dbt_post
# Start the dbt RPC server
CMD ["dbt","run"]

Enter fullscreen mode Exit fullscreen mode

2.3 DBT Structure

To avoid have a lot of structure, and test all the structure, I've only change 3 files to try to test if the process works fine.

I've just made a query of a distinct num_age of one table which is already loaded in the postgres.

For this reason in models/raw, I've created two files

1- Schema.yml - to get all the information about the source file

version: 2

sources: 
  - name: InfoClients
    description: "Information about Clients"
    database: "RAW"
    schema: public
    tables:
      - name: newclients

Enter fullscreen mode Exit fullscreen mode

2- Age - to get all the diferent values of a table:


with final as (
    select 
        distinct num_age
    from {{ source('InfoClients','newclients')}}
)
select * 
from final

Enter fullscreen mode Exit fullscreen mode

Finally, I've already modify dbt_project.yml to try to get the model

name: 'dbt_post'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'dbt_post'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  raw:

Enter fullscreen mode Exit fullscreen mode

3 Docker compose

Finally, I've already merge all the information into one single structure:


version: '3.9'
services:
  post_db:
    build: . 
    user: postgres
    image: postgres/test:v1
    environment:
      - POSTGRES_USER=${PS_USER}
      - POSTGRES_PASSWORD=${PS_PASSWORD}
      - PG_DATA:/var/lib/postgresql/data/pgdata
      - POSTGRES_DB=${PS_DB}
    healthcheck:
      test: ["CMD-SHELL","pg_isready -U ${PS_USER} ${PS_PASSWORD}"]
      interval: 10s
      timeout: 5s
      retries: 5
    ports:
      - "5432:5432"
    volumes:
      - db-data:/var/lib/postgresql/data
    restart: unless-stopped
  python:
    build:
      context: .
      dockerfile: Dockerfile.python
      args:
        PY_VERSION: ${PY_VERSION}
    depends_on:
      - post_db
volumes:
  db-data: 
Enter fullscreen mode Exit fullscreen mode

4 Checks if table works

When it runs all:

docker compose up
Enter fullscreen mode Exit fullscreen mode

... and when you check it on the database. You could see there is a view with a name of the ages:

Check of the code

5 Conclusion

I've spend a lot of time trying to get the information between my machine and docker. The reason: I've already have the port ocupy for other project and I've couldn't see if postgres it was doing something.

If you had the same problem as me in Linux you could see your ports in here:

lsof -i :<your desired port>
Enter fullscreen mode Exit fullscreen mode

If you could see anything try with sudo, because in my case i've couldn't see another project working with this port.

When you already found which one is your not desired project then :

(sudo) kill <PID>
Enter fullscreen mode Exit fullscreen mode

Have a nice day:).

💖 💪 🙅 🚩
obrotoks
Obrotoks

Posted on June 1, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Create a DBT w/Postgres
beginners Create a DBT w/Postgres

June 1, 2023