What cron does well
- Schedule a single command at a recurring time
- Zero infrastructure — already installed on every Linux/Unix box
- Trivial syntax (once you know it)
- Universally understood by every sysadmin
What cron doesn't do
- Dependency management. "Run B only after A succeeds" requires shell pipes or scripts
- Retries with backoff. If A fails, cron has no retry logic — you write it yourself
- Backfills. If you missed last week's run, cron can't catch up
- Parameter passing. Passing yesterday's date to today's run is manual
- Observability. No UI showing "what ran when, how long it took, did it succeed"
- Cross-host coordination. Job runs only on the host where it's scheduled
What Airflow adds
| Feature | How Airflow does it |
|---|---|
| Dependency graphs | Define DAGs in Python — task_a >> task_b >> task_c |
| Retries | Per-task: retries=3, retry_delay=timedelta(minutes=5) |
| Backfills | Re-run any past date range from the UI |
| Templated parameters | Jinja: {% raw %}{{ ds }}{% endraw %} for execution date |
| Web UI | See every run, every task, every log |
| Distributed execution | Workers across many hosts (Celery, Kubernetes, etc.) |
Code comparison
Cron approach
# crontab 0 2 * * * /bin/bash -c 'extract.sh && transform.sh && load.sh' \ >> /var/log/etl.log 2>&1
If transform.sh fails, load.sh doesn't run. But you have no automatic retry, no visibility into which step failed, no way to re-run yesterday's data.
Airflow approach
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
with DAG('etl', start_date=datetime(2024,1,1),
schedule_interval='0 2 * * *',
catchup=False) as dag:
extract = BashOperator(task_id='extract', bash_command='extract.sh',
retries=3, retry_delay=timedelta(minutes=5))
transform = BashOperator(task_id='transform', bash_command='transform.sh',
retries=2)
load = BashOperator(task_id='load', bash_command='load.sh')
extract >> transform >> load
When to graduate from cron
You've outgrown cron when any of these is true:
- You're chaining 3+ shell scripts via
&& - You've written your own retry-on-failure logic
- You've built dashboards to see what ran
- You need to re-run jobs for specific past dates
- You need to pass output from one job to another
Lighter alternatives to Airflow
Airflow is heavyweight (database, scheduler, webserver, workers). For mid-complexity workflows, consider:
- Prefect / Dagster — modern Python-native DAG runners
- systemd timers — for sequencing on a single host
- GitHub Actions / GitLab CI — if your jobs already live in a Git repo
- AWS Step Functions / Argo Workflows — for cloud-native graphs