Hello I am using S3KeySensor to look for parquet files created in specific partition. As the file names are generated from spark like (part-00499-e91c1af8-4352-4de9*), what should be the bucket_key ? Below code is failing.
ex:bucket_key=f"inbound/phix/empnf/datasetdate={var_ds_date}/*.parquet"
s3_data_filechk=S3KeySensor(
task_id='s3_data_filechk',
bucket_name=data_bucket_name,
bucket_key=f"inbound/phix/empnf/datasetdate={var_ds_date}/*.parquet/",
timeout=60 * 30, # timeout in 30 minutes
poke_interval=60 * 5 # (seconds); checking file in every five minutes
)
CodePudding user response:
The S3KeySensor
has parameter wildcard_match
: whether the bucket_key should be interpreted as a Unix wildcard pattern. You can use it to set the logic you wish.
Example:
sensor=S3KeySensor(
...
bucket_key="*.parquet",
wildcard_match=True
)