Introduction

AirFlow is an open-source workflow scheduler written in Python that ships with a rich UI.

Installation

Python

1
2
3
4
aptitude install python
aptitude install python-dev
aptirude install python-pip
aptitude install libmysqlclient-dev

AirFlow

1
pip install airflow

Supervisor

1
aptitude install supervisor

Configuration

AirFlow

Initialization

1
airflow initdb

Adding User Login

Install the corresponding module.

1
pip install "airflow[password]"

Add the configuration.

1
2
3
4
vim airflow.cfg
## Under [webserver], add
authenticate = True
auth_backend = airflow.contrib.auth.backends.password_auth

Switch into the airflow directory.

1
2
cd ~/airflow
python

Run the Python commands.

1
2
3
4
5
6
7
8
9
10
11
12
import airflow
from airflow import models, settings
from airflow.contrib.auth.backends.password_auth import PasswordUser
user = PasswordUser(models.User())
user.username = 'user_name'
user.email = 'email@example.com'
user.password = 'password'
session = settings.Session()
session.add(user)
session.commit()
session.close()
exit()

Supervisord

Add startup management for the webserver and scheduler.

1
2
3
4
5
6
7
8
9
10
11
12
13
vim /etc/supervisor/conf.d/airflow.conf 

## Add
[program:airflow_webserver]
command=airflow webserver
user=ubuntu
stderr_logfile=/var/log/airflow/webserver.err.log
stdout_logfile=/var/log/airflow/webserver.out.log
[program:airflow_scheduler]
command=airflow scheduler
user=ubuntu
stderr_logfile=/var/log/airflow/scheduler.err.log
stdout_logfile=/var/log/airflow/scheduler.out.log

Issues

1
2
3
4
5
6
7
8
9
10
11
ImportError: No module named pidlockfile

## Solution

aptitude remove python-lockfile
pip install lockfile
ImportError: cannot import name MySqlOperator

## Solution

pip install airflow[celery]