Obrotoks
Posted on June 1, 2023
Goal
Create a docker with postges. Then create another docker with dbt and launch an operation with dbt. Source code
Requirements
- Docker
- Postgres
- Python (with DBT)
1 - Create a Server with Postgres
I've already create a post and how it could be done.
2 - Create a Server with Python
For this part, I've used this post to help about how to construct a python server with dbt.
However, there are a difference between these article, we want to launch this task only one time.
2.1 - Python Requirements
For this purpose, we will need this requirements of python:
dbt-postgres==1.0.0
markupsafe==2.0.1
DBT library to make the ETL process & markupsafe to avoid problems with the config profile yml
2.2 - Dockerfile
We should create this docker file to try to create our server to launch our DBT process
FROM python:3.8.1-slim-buster
# Working directory
WORKDIR /app
# Install OS dependencies
RUN apt-get update && apt-get install -qq -y \
git gcc build-essential libpq-dev --fix-missing --no-install-recommends \
&& apt-get clean
# Make sure we are using latest pip
RUN pip install --upgrade pip
# Create directory for dbt config
RUN mkdir -p /root/.dbt
# Copy requirements.txt
COPY requirements.txt requirements.txt
# Install dependencies
RUN pip install -r requirements.txt
# Copy dbt profile
COPY profiles.yml /root/.dbt/profiles.yml
# Copy source code
COPY dbt_post/ /app
WORKDIR /app/dbt_post
# Start the dbt RPC server
CMD ["dbt","run"]
2.3 DBT Structure
To avoid have a lot of structure, and test all the structure, I've only change 3 files to try to test if the process works fine.
I've just made a query of a distinct num_age of one table which is already loaded in the postgres.
For this reason in models/raw, I've created two files
1- Schema.yml - to get all the information about the source file
version: 2
sources:
- name: InfoClients
description: "Information about Clients"
database: "RAW"
schema: public
tables:
- name: newclients
2- Age - to get all the diferent values of a table:
with final as (
select
distinct num_age
from {{ source('InfoClients','newclients')}}
)
select *
from final
Finally, I've already modify dbt_project.yml to try to get the model
name: 'dbt_post'
version: '1.0.0'
config-version: 2
# This setting configures which "profile" dbt uses for this project.
profile: 'dbt_post'
# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_packages"
# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models
# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
raw:
3 Docker compose
Finally, I've already merge all the information into one single structure:
version: '3.9'
services:
post_db:
build: .
user: postgres
image: postgres/test:v1
environment:
- POSTGRES_USER=${PS_USER}
- POSTGRES_PASSWORD=${PS_PASSWORD}
- PG_DATA:/var/lib/postgresql/data/pgdata
- POSTGRES_DB=${PS_DB}
healthcheck:
test: ["CMD-SHELL","pg_isready -U ${PS_USER} ${PS_PASSWORD}"]
interval: 10s
timeout: 5s
retries: 5
ports:
- "5432:5432"
volumes:
- db-data:/var/lib/postgresql/data
restart: unless-stopped
python:
build:
context: .
dockerfile: Dockerfile.python
args:
PY_VERSION: ${PY_VERSION}
depends_on:
- post_db
volumes:
db-data:
4 Checks if table works
When it runs all:
docker compose up
... and when you check it on the database. You could see there is a view with a name of the ages:
5 Conclusion
I've spend a lot of time trying to get the information between my machine and docker. The reason: I've already have the port ocupy for other project and I've couldn't see if postgres it was doing something.
If you had the same problem as me in Linux you could see your ports in here:
lsof -i :<your desired port>
If you could see anything try with sudo, because in my case i've couldn't see another project working with this port.
When you already found which one is your not desired project then :
(sudo) kill <PID>
Have a nice day:).
Posted on June 1, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.