Home > Blockchain >  S3 bucket sensor for new file
S3 bucket sensor for new file

Time:08-25

I am working on an ETL pipeline using docker airflow. I want to trigger my pipeline whenever any new file is uploaded to S3 bucket. Is there any S3sensor in airflow that checks any new file in bucket? The S3sensor should ignore the existing files in location and should only trigger when new file is added to S3.

CodePudding user response:

You have several options to achieve this goal:

  • The best solution is creating S3 Event Notifications on file creation to send a message to SQS. In Airflow you can create a sensor to check if there are new messages to process them.
  • You can also create a sensor which list the files in S3 bucket, and add them to a state store (DB for ex) with state to_process, the next time it will compares between the files list and the files in the state store to know it there are new files or not, then your dag process the records in the state store which have a state != done, and when it finish the processing, it updates the state to done. You can add other metadata like created_at, processed_at, and other states like error to reprocess them in the next run or send an alert to your team.
  • Related