Comparison

Cron vs Airflow.

Cron runs one command at a scheduled time. Airflow runs DAGs — directed acyclic graphs of dependent tasks with retries, backfills, parameter passing, and a UI. If your workflow has more than ~3 sequential steps with dependencies, you've probably outgrown cron.

What cron does well

  • Schedule a single command at a recurring time
  • Zero infrastructure — already installed on every Linux/Unix box
  • Trivial syntax (once you know it)
  • Universally understood by every sysadmin

What cron doesn't do

  • Dependency management. "Run B only after A succeeds" requires shell pipes or scripts
  • Retries with backoff. If A fails, cron has no retry logic — you write it yourself
  • Backfills. If you missed last week's run, cron can't catch up
  • Parameter passing. Passing yesterday's date to today's run is manual
  • Observability. No UI showing "what ran when, how long it took, did it succeed"
  • Cross-host coordination. Job runs only on the host where it's scheduled

What Airflow adds

FeatureHow Airflow does it
Dependency graphsDefine DAGs in Python — task_a >> task_b >> task_c
RetriesPer-task: retries=3, retry_delay=timedelta(minutes=5)
BackfillsRe-run any past date range from the UI
Templated parametersJinja: {% raw %}{{ ds }}{% endraw %} for execution date
Web UISee every run, every task, every log
Distributed executionWorkers across many hosts (Celery, Kubernetes, etc.)

Code comparison

Cron approach

# crontab
0 2 * * * /bin/bash -c 'extract.sh && transform.sh && load.sh' \
  >> /var/log/etl.log 2>&1

If transform.sh fails, load.sh doesn't run. But you have no automatic retry, no visibility into which step failed, no way to re-run yesterday's data.

Airflow approach

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

with DAG('etl', start_date=datetime(2024,1,1),
         schedule_interval='0 2 * * *',
         catchup=False) as dag:
    extract = BashOperator(task_id='extract', bash_command='extract.sh',
                           retries=3, retry_delay=timedelta(minutes=5))
    transform = BashOperator(task_id='transform', bash_command='transform.sh',
                             retries=2)
    load = BashOperator(task_id='load', bash_command='load.sh')
    extract >> transform >> load

When to graduate from cron

You've outgrown cron when any of these is true:

  • You're chaining 3+ shell scripts via &&
  • You've written your own retry-on-failure logic
  • You've built dashboards to see what ran
  • You need to re-run jobs for specific past dates
  • You need to pass output from one job to another

Lighter alternatives to Airflow

Airflow is heavyweight (database, scheduler, webserver, workers). For mid-complexity workflows, consider:

  • Prefect / Dagster — modern Python-native DAG runners
  • systemd timers — for sequencing on a single host
  • GitHub Actions / GitLab CI — if your jobs already live in a Git repo
  • AWS Step Functions / Argo Workflows — for cloud-native graphs
Related

Continue reading.