Join us
@denismatveev ・ Oct 20,2021 ・ 2 min read ・ 1393 views ・ Originally posted on faun.pub
Part 3
Implementing the Airflow HA solution
Well, let’s continue building the HA cluster.
The current chapter of the tutorial is installing and configuring application for DAGs synchronization and Airflow itself.
1.8 csync2
Install and configure csync2:
# apt-get -y install csync2
root@node1:~# cat /etc/csync2.cfg
# Please read the documentation:
# http://oss.linbit.com/csync2/paper.pdf
nossl * *;
tempdir /tmp/;
lock-timeout 30;
group DAGS
{
host node1;
host node2;
host node3;
key /root/airflow/csync2.key_airflow_dags;
include /root/airflow/dags;
auto younger;
}
Pre-shared keys can be generated using the following:
# csync2 -k filename
Then specify the file with your keys in the string
key /root/airflow/csync2.key_airflow_dags
This is used for authentication nodes.
Which directory is synchronized should be specified in
include /root/airflow/dags;
So, add a job to crontab:
# crontab -e
* * * * * /usr/sbin/csync2 -A -x
Then you can create a file or directory and they will appear on all nodes(will be synchronized). Of course, only one key is used here across all nodes.
To autostart the csync2 daemon with necessary params, it’s necessary to modify the systemd unit:
root@node1:~# cat /lib/systemd/system/csync2.service
[Unit]
Description=Cluster file synchronization daemon
Documentation=man:csync2(1)
After=network.target
[Service]
ExecStart=/usr/sbin/csync2 -A -ii -l
StandardError=syslog
[Install]
WantedBy=multi-user.target
Start csync2:
# systemctl start csync2
1.9 Airflow
And finally, we got the moment when all preparations have been done and we can install Airflow itself:
pip3 install 'apache-airflow[celery]'
Initialize a database:
# airflow db init
Create airflow components systemd units:
# cat /lib/systemd/system/airflow-worker.service
[Unit]
Description=Airflow celery worker daemon
After=network.target postgresql.service rabbitmq-server.service
Wants=postgresql.service rabbitmq-server.service
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/airflow celery worker
Restart=on-failure
RestartSec=10s
[Install]
WantedBy=multi-user.target
# cat /lib/systemd/system/airflow-scheduler.service
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service rabbitmq-server.service
Wants=postgresql.service rabbitmq-server.service
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
# cat /lib/systemd/system/airflow-webserver.service
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service rabbitmq-server.service
Wants=postgresql.service rabbitmq-server.service
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/airflow webserver
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Don’t forget to enable them.
Create config file by the command in the default directory:
# mkdir ~/airflow
# airflow config list > ~/airflow/airflow.cfg
Find and modify the following settings:
in [code] section:
executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@10.5.0.10/airflow
in section [celery]:
broker_url = amqp://airflow:cafyilevyeHa@localhost:5672;amqp://airflow:cafyilevyeHa@node2:5672;amqp://airflow:cafyilevyeHa@node3:5672
result_backend = db+postgresql://airflow:airflow@10.5.0.10:5432/airflow
As for broker_url
, I specified all three nodes, but it’s redundant because we suppose celery will connect to localhost and if the whole node is down, there is no reason to reconnect to anyone else node. But for reliability, I added three nodes. You can consider specifying only localhost.
Create a user:
# airflow users create --username admin --firstname Denis --lastname Matveev --role Admin --email denis@example.com
Restart the airflow:
# systemctl restart airflow-scheduler airflow-webserver airflow-worker
Look if some errors occur.
To get access to the web interface I recommend using an an ssh tunnel.
Right now we have the Airflow cluster is up and running. Double-check all settings again to make sure you didn’t make an error.
Try the following:
# systemctl stop etcd patroni
# rm /var/lib/etcd/default/* -rf
sometimes you can remove all PostgreSQL data:
# rm -rf /var/lib/postgresql/13/main/
But be careful! This can be done only if you have test environment, production servers are not for experiments, you should understand what you are doing.
And restart the services:
# systemctl start etcd patroni
Issues with csync can be resolved by removing directories with a database:
# systemctl stop csync2
# rm /var/lib/csync2/*.db3
# systemctl start csync3
Then take a look at statuses of processes by issuing the command:
# systemctl status <process>
Last but not least I’d like to say, or even strongly recommend, you should consider the implementation of monitoring of the system. What system to use is up to you. I use Zabbix and special templates for PostgreSQL, templates for tracking if processes are running, and so on. Any comments are welcome!
Join other developers and claim your FAUN account now!
sysadmin/devops, Ignitia AB
@denismatveevInfluence
Total Hits
Posts
Only registered users can post comments. Please, login or signup.