Centos7 installation and deployment of Airflow detailed

Airflow(1.10)+celery+redis installation under Centos7#

Installation environment and version##


Airflow 1.10.6

Python 3.6.8

Mysql 5.6

redis 3.3


Database installation###

Slightly (by Baidu)

airflow installation###

vim ~/.bashrc
# Add a line of environment variable export AIRFLOW_HOME=/opt/airflow
source ~/.bashrc

Install airflow

# Generate configuration file, some errors may be reported, please ignore, ensure AIRFLOW_Generated under the HOME directory.cfg and related documents prove the success of this execution#If the environment variables of pytho are configured, execute directly#Not configured in${PYTHON_HOME}/lib/python3.6/sit-packages/airflow/Execute in the bin directory`./airflow`

pip install apache-airflow

Install airflow related dependencies#

pip install'apache-airflow[mysql]'
pip install'apache-airflow[celery]'
pip install'apache-airflow[redis]'


Modify the configuration file###

# sqlalchemy link
sql_alchemy_conn = mysql://username:password@localhost:3306/airflow
# Configure actuator
# Configure celery broker_url
broker_url = redis://lochost:5379/0
# Configure metadata information management
result_backend = db+mysql://username:password@localhost:3306/airflow

Create user (worker is not allowed to be executed under root user)

# Create user group and user groupadd airflow
useradd airflow -g airflow
# will{AIRFLOW_HOME}Directory repair user group cd/opt/
chgrp -R airflow airflow

start up##

# Start web service in the foreground
airflow webserver 

# Start web service in the background
airflow webserver -D

# Start scheduler in the foreground
airflow schedule

# Start scheduler in the background
airflow scheduler -D

Start worker

# The worker host only needs to open the airflow worker with ordinary users#Create user airflowuseradd airflow

# Set password passwd airflow for user test

# Under the root user, change the permissions of the airflow folder and set it to fully open chmod-R 777 /opt/airflow

# Switch to a normal user and execute the airflow worker command#Found that ordinary users read during startup~/.bashrc file is inconsistent and rejoin AIRFLOW_HOME will do#If you configure the environment variables before creating a new normal user, you may not have this problem. I modified the environment variables after creating the user.
airflow worker 


# Run temporary variables before executing the worker (temporary cannot be used permanently) export C_FORCE_ROOT="true"#No need to switch user cd/usr/local/python3/bin/

# Start worker service in the foreground
airflow worker

# Start the work service in the background
airflow worker -D

Modify time zone##

default_timezone = Asia/Shanghai
The reference is as follows:
cd /usr/local/lib/python3.6/site-packages/airflow
# In utc= pendulum.timezone(‘UTC’)This line(Line 27)Add from airflow under the code.configuration import conf
	tz = conf.get("core","default_timezone")if tz =="system":
		utc = pendulum.local_timezone()else:
		utc = pendulum.timezone(tz)except Exception:pass#Modify utcnow()function(On line 69)

Original code d= dt.datetime.utcnow() 
Amended to d= dt.datetime.now()
# In utc= pendulum.timezone(‘UTC’)This line(Line 37)Add from airflow under the code.configuration  import conf
	tz = conf.get("core","default_timezone")if tz =="system":
		utc = pendulum.local_timezone()else:
		utc = pendulum.timezone(tz)except Exception:pass
Put the code var UTCseconds=(x.getTime()+ x.getTimezoneOffset()*60*1000); 
Change to var UTCseconds= x.getTime();

Put the code"timeFormat":"H:i:s %UTC%",

Configure email alarm and modify it in airflow configuration file airflow.cfg##

default_args ={
 # Accept mailbox
 ' email':['[email protected]''],
 # Whether to send mail when task fails
 ' email_on_failure': True,
 # whether task retry to send mail
 ' email_on_retry': False,}



When running a task, it is found that some tasks will have abnormal data when they are in parallel. Solution:

Set in the global variables of airflow

Add parameters to the DAG to control the entire dag

dag =DAG(f"dag_name",
   schedule_interval="0 12 * * *",
   max_active_runs =1)

Set the parameters in the Operator in each task

t3 =PythonOperator(

Please correct me if there are any errors#

