How to develop Apache Airflow DAGs in Docker Compose
Jakub T
Posted on May 5, 2020
How to run a development environment on docker-compose
Quick overview of how to run Apache airflow for development and tests on your local machine using docker-compose.
We will be still using unofficial puckel/docker-airflow
image. There is already an official docker image but I didn't test it yet.
Requirements
- docker
- docker-compose - https://docs.docker.com/compose/install/
Project structure
- docker-compose.yml - configuration file for the docker-compose
- dags - will contain all our dags
- lib - will contain all our custom code
- test - will contain our pytests
- .env - file with environment variables that we wish to include the containers
The environment variables are very handy because they allow you to customize almost everything in Airflow (https://airflow.apache.org/docs/stable/best-practices.html?highlight=environment#configuration)
docker-compose.yml
The basic structure:
version: '2.1'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
webserver:
image: puckel/docker-airflow:1.10.9
restart: always
mem_limit: 2048m
depends_on:
- postgres
env_file:
- .env
environment:
- LOAD_EX=n
- EXECUTOR=Local
volumes:
- ./dags:/usr/local/airflow/dags
- ./test:/usr/local/airflow/test
- ./plugins:/usr/local/airflow/plugins
# Uncomment to include custom plugins
- ./requirements.txt:/requirements.txt
- ~/.aws:/usr/local/airflow/.aws
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
As you can see we have several things there:
- we allow to pass custom environment variables straight from the dotenv file (best practice is not include it in the files)
- we will use postgres instance running as another docker container
- we share our dags/test/plugins directories with the host so we can just edit our code on our local machine and run all the tests in container
Dummy DAG
Let's edit our first DAG: dags/dummy_dag.py
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime
with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
op = DummyOperator(task_id='op')
Running the environment
$ docker-compose up
Starting airflow-on-docker-compose_postgres_1 ... done
Starting airflow-on-docker-compose_webserver_1 ... done
Attaching to airflow-on-docker-compose_postgres_1, airflow-on-docker-compose_webserver_1
[...]
webserver_1 | __init__.py:51}} INFO - Using executor [2020-05-05 10:19:08,741] {{dagbag.py:403}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1 | LocalExecutor
webserver_1 | [2020-05-05 10:19:08,743] {{dagbag.py:403}} INFO - Filling up the DagBag from /usr/local/airflow/dags
Let's open the (http://localhost:8080)
Running the tests in the environment
In order to run the tests in the environment we can just run:
docker-compose run webserver bash
This will give us access to the bash running in the container:
➜ airflow-on-docker-compose git:(master) ✗ docker-compose run webserver bash
Starting airflow-on-docker-compose_postgres_1 ... done
WARNING: You are using pip version 20.0.2; however, version 20.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
airflow@be3e69366e23:~$ ls
airflow.cfg dags plugins test
airflow@be3e69366e23:~$ pytest test
bash: pytest: command not found
Of course we didn't install pytest yet - this is very easy:
$ echo "pytest" >> requirements.txt
$ docker-compose run webserver bash
Starting airflow-on-docker-compose_postgres_1 ... done
Collecting pytest
Downloading pytest-5.4.1-py3-none-any.whl (246 kB)
|████████████████████████████████| 246 kB 222 kB/s
Collecting more-itertools>=4.0.0
Downloading more_itertools-8.2.0-py3-none-any.whl (43 kB)
|████████████████████████████████| 43 kB 3.1 MB/s
Collecting wcwidth
Downloading wcwidth-0.1.9-py2.py3-none-any.whl (19 kB)
Requirement already satisfied: importlib-metadata>=0.12; python_version < "3.8" in /usr/local/lib/python3.7/site-packages (from pytest->-r /requirements.txt (line 1)) (1.5.0)
Collecting packaging
Downloading packaging-20.3-py2.py3-none-any.whl (37 kB)
Collecting pluggy<1.0,>=0.12
Downloading pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
Collecting py>=1.5.0
Downloading py-1.8.1-py2.py3-none-any.whl (83 kB)
|████████████████████████████████| 83 kB 956 kB/s
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.7/site-packages (from pytest->-r /requirements.txt (line 1)) (19.3.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata>=0.12; python_version < "3.8"->pytest->-r /requirements.txt (line 1)) (2.2.0)
Requirement already satisfied: six in /usr/local/lib/python3.7/site-packages (from packaging->pytest->-r /requirements.txt (line 1)) (1.14.0)
Collecting pyparsing>=2.0.2
Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
|████████████████████████████████| 67 kB 624 kB/s
Installing collected packages: more-itertools, wcwidth, pyparsing, packaging, pluggy, py, pytest
WARNING: The scripts py.test and pytest are installed in '/usr/local/airflow/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed more-itertools-8.2.0 packaging-20.3 pluggy-0.13.1 py-1.8.1 pyparsing-2.4.7 pytest-5.4.1 wcwidth-0.1.9
We can implement our first basic test taken directly from (https://github.com/apache/airflow/blob/master/docs/best-practices.rst)
from airflow.models import DagBag
def test_dag_loading():
dagbag = DagBag()
dag = dagbag.get_dag(dag_id='dummy_dag')
assert dagbag.import_errors == {}
assert dag is not None
assert len(dag.tasks) == 1
And now we can freely run our tests:
airflow@a6ca8c1b706d:~$ .local/bin/pytest
========================================================================== test session starts ==========================================================================
platform linux -- Python 3.7.6, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /usr/local/airflow
plugins: celery-4.4.0
collected 1 item
test/test_dag_loading.py .
===================================================================== 1 passed in 0.83s =====================================================================
All the code can be found here: https://github.com/troszok/airflow-on-docker-compose
Posted on May 5, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.