Home > front end >  How to use wild card character to search for s3 file using S3KeySensor in airflow
How to use wild card character to search for s3 file using S3KeySensor in airflow

Time:02-12

Hello I am using S3KeySensor to look for parquet files created in specific partition. As the file names are generated from spark like (part-00499-e91c1af8-4352-4de9*), what should be the bucket_key ? Below code is failing.

    ex:bucket_key=f"inbound/phix/empnf/datasetdate={var_ds_date}/*.parquet"

s3_data_filechk=S3KeySensor(
         task_id='s3_data_filechk',
         bucket_name=data_bucket_name,
         bucket_key=f"inbound/phix/empnf/datasetdate={var_ds_date}/*.parquet/",
         timeout=60 * 30, # timeout in 30 minutes 
         poke_interval=60 * 5 # (seconds); checking file in every five minutes
     )

CodePudding user response:

The S3KeySensor has parameter wildcard_match: whether the bucket_key should be interpreted as a Unix wildcard pattern. You can use it to set the logic you wish.

Example:

sensor=S3KeySensor(
         ...
         bucket_key="*.parquet",
         wildcard_match=True
     )
  • Related