Airflow Verified Download File From Url Info
The most flexible way to download a file is by writing a custom Python function and executing it with the PythonOperator . This method uses the standard requests library to fetch the file and save it to a local directory.
from airflow.operators.bash import BashOperator download_bash = BashOperator( task_id='download_with_wget', bash_command='wget -O /tmp/downloaded_file.zip https://example.com' ) Use code with caution. Best Practices for File Downloads Operators — Airflow 3.1.8 Documentation airflow download file from url
For simple downloads where you don't want to manage Python dependencies, you can use the BashOperator to run standard CLI tools like wget or curl . The most flexible way to download a file
import requests from airflow import DAG from airflow.operators.python import PythonOperator from datetime import datetime def download_file_func(url, save_path): response = requests.get(url) with open(save_path, 'wb') as f: f.write(response.content) print(f"File downloaded to {save_path}") with DAG('download_dag', start_date=datetime(2023, 1, 1), schedule_interval='@daily') as dag: download_task = PythonOperator( task_id='download_file', python_callable=download_file_func, op_kwargs={ 'url': 'https://example.com', 'save_path': '/tmp/data.csv' } ) Use code with caution. 2. Using the HttpToS3Operator (Direct to S3) Best Practices for File Downloads Operators — Airflow 3
: Streamlines the process by avoiding the local filesystem and using Airflow's built-in http_conn_id and s3_conn_id . 3. Using BashOperator with wget or curl
If your goal is to move a file directly from a URL to an Amazon S3 bucket, the HttpToS3Operator is the most efficient choice. It eliminates the need for manual file handling and integrates directly with Airflow connections.
To download a file from a URL using , you can use several methods depending on your target storage—whether it's the local filesystem, Amazon S3, or Google Cloud Storage (GCS). 1. Using the PythonOperator (Local Filesystem)
