How to develop Apache Airflow DAGs in Docker Compose

digitaldisorder

Jakub T

Posted on May 5, 2020

How to develop Apache Airflow DAGs in Docker Compose

How to run a development environment on docker-compose

Quick overview of how to run Apache airflow for development and tests on your local machine using docker-compose.

We will be still using unofficial puckel/docker-airflow image. There is already an official docker image but I didn't test it yet.

Requirements

Project structure

  • docker-compose.yml - configuration file for the docker-compose
  • dags - will contain all our dags
  • lib - will contain all our custom code
  • test - will contain our pytests
  • .env - file with environment variables that we wish to include the containers

The environment variables are very handy because they allow you to customize almost everything in Airflow (https://airflow.apache.org/docs/stable/best-practices.html?highlight=environment#configuration)

docker-compose.yml

The basic structure:

version: '2.1'
services:
    postgres:
        image: postgres:9.6
        environment:
            - POSTGRES_USER=airflow
            - POSTGRES_PASSWORD=airflow
            - POSTGRES_DB=airflow
    webserver:
        image: puckel/docker-airflow:1.10.9
        restart: always
        mem_limit: 2048m
        depends_on:
            - postgres
        env_file:
            - .env
        environment:
            - LOAD_EX=n
            - EXECUTOR=Local
        volumes:
            - ./dags:/usr/local/airflow/dags
            - ./test:/usr/local/airflow/test
            - ./plugins:/usr/local/airflow/plugins
            # Uncomment to include custom plugins
            - ./requirements.txt:/requirements.txt
            - ~/.aws:/usr/local/airflow/.aws
        ports:
            - "8080:8080"
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3

As you can see we have several things there:

  • we allow to pass custom environment variables straight from the dotenv file (best practice is not include it in the files)
  • we will use postgres instance running as another docker container
  • we share our dags/test/plugins directories with the host so we can just edit our code on our local machine and run all the tests in container

Dummy DAG

Let's edit our first DAG: dags/dummy_dag.py

from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime

with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
    op = DummyOperator(task_id='op')

Running the environment

$ docker-compose up

Starting airflow-on-docker-compose_postgres_1 ... done
Starting airflow-on-docker-compose_webserver_1 ... done
Attaching to airflow-on-docker-compose_postgres_1, airflow-on-docker-compose_webserver_1
[...]
webserver_1  | __init__.py:51}} INFO - Using executor [2020-05-05 10:19:08,741] {{dagbag.py:403}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1  | LocalExecutor
webserver_1  | [2020-05-05 10:19:08,743] {{dagbag.py:403}} INFO - Filling up the DagBag from /usr/local/airflow/dags

Let's open the (http://localhost:8080)

Airflow instance on docker-compose

Running the tests in the environment

In order to run the tests in the environment we can just run:

docker-compose run webserver bash

This will give us access to the bash running in the container:

➜  airflow-on-docker-compose git:(master) ✗ docker-compose run webserver bash
Starting airflow-on-docker-compose_postgres_1 ... done
WARNING: You are using pip version 20.0.2; however, version 20.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
airflow@be3e69366e23:~$ ls
airflow.cfg  dags  plugins  test
airflow@be3e69366e23:~$ pytest test
bash: pytest: command not found

Of course we didn't install pytest yet - this is very easy:

$ echo "pytest" >> requirements.txt
$ docker-compose run webserver bash
Starting airflow-on-docker-compose_postgres_1 ... done
Collecting pytest
  Downloading pytest-5.4.1-py3-none-any.whl (246 kB)
     |████████████████████████████████| 246 kB 222 kB/s
Collecting more-itertools>=4.0.0
  Downloading more_itertools-8.2.0-py3-none-any.whl (43 kB)
     |████████████████████████████████| 43 kB 3.1 MB/s
Collecting wcwidth
  Downloading wcwidth-0.1.9-py2.py3-none-any.whl (19 kB)
Requirement already satisfied: importlib-metadata>=0.12; python_version < "3.8" in /usr/local/lib/python3.7/site-packages (from pytest->-r /requirements.txt (line 1)) (1.5.0)
Collecting packaging
  Downloading packaging-20.3-py2.py3-none-any.whl (37 kB)
Collecting pluggy<1.0,>=0.12
  Downloading pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
Collecting py>=1.5.0
  Downloading py-1.8.1-py2.py3-none-any.whl (83 kB)
     |████████████████████████████████| 83 kB 956 kB/s
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.7/site-packages (from pytest->-r /requirements.txt (line 1)) (19.3.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata>=0.12; python_version < "3.8"->pytest->-r /requirements.txt (line 1)) (2.2.0)
Requirement already satisfied: six in /usr/local/lib/python3.7/site-packages (from packaging->pytest->-r /requirements.txt (line 1)) (1.14.0)
Collecting pyparsing>=2.0.2
  Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
     |████████████████████████████████| 67 kB 624 kB/s
Installing collected packages: more-itertools, wcwidth, pyparsing, packaging, pluggy, py, pytest
  WARNING: The scripts py.test and pytest are installed in '/usr/local/airflow/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed more-itertools-8.2.0 packaging-20.3 pluggy-0.13.1 py-1.8.1 pyparsing-2.4.7 pytest-5.4.1 wcwidth-0.1.9

We can implement our first basic test taken directly from (https://github.com/apache/airflow/blob/master/docs/best-practices.rst)

from airflow.models import DagBag

def test_dag_loading():
    dagbag = DagBag()
    dag = dagbag.get_dag(dag_id='dummy_dag')
    assert dagbag.import_errors == {}
    assert dag is not None
    assert len(dag.tasks) == 1

And now we can freely run our tests:

airflow@a6ca8c1b706d:~$ .local/bin/pytest
========================================================================== test session starts ==========================================================================
platform linux -- Python 3.7.6, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /usr/local/airflow
plugins: celery-4.4.0
collected 1 item

test/test_dag_loading.py .

===================================================================== 1 passed in 0.83s =====================================================================

All the code can be found here: https://github.com/troszok/airflow-on-docker-compose

💖 💪 🙅 🚩
digitaldisorder
Jakub T

Posted on May 5, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related