I have a client who uploads x amount of files to an S3 bucket. Currently, I have a lambda that processes the files that go into that bucket- but it triggers upon each file being uploaded. The problem is, I have no idea how many files the client will upload at once- it could be one file, or it could be up to ten. I have some logic in the lambda that returns a different output depending on how many files have been uploaded.
I came across Want to upload multiple files to S3 and only when all are uploaded trigger a lambda function - Stack Overflow which I really like the sound of- however I am not sure how to set it up (or what the policies for multiple file uploads will be). I should be able to get my lambda to subscribe to a topic easy enough, but how do I notify an SNS topic that a batch (which can vary in number) of files have been uploaded in a single instance?
CodePudding user response:
This is a common question!
Basically, 'something' needs to be able to say "All the files are here, start processing them!"
Since you have a variable number of files arriving, merely counting the files will not be sufficient. Instead, you'll need something else to trigger the processing, such as:
- The client providing a list of all files to be included (a 'manifest file'), or
- The client performing some action to say "Done, ready for processing", or
- Waiting a certain amount of time after the last upload (eg 10 minutes) and then assuming all files were provided
In the Question you linked, the client would be responsible for sending a message to an Amazon SNS topic to trigger processing. This could be achieved by giving them a script file that runs the AWS CLI. This would need IAM credentials, but presumably they have this already since they are uploading to S3?
Other signalling methods from the client could be:
- After uploading the files, they upload one final file with a special name (eg
no-more-files.txt
), which means that processing should commence. The Lambda function could look for this name. - They go to a web page and click a button, which triggers the Lambda function.
- They send an email to a special address, which triggers the Lambda function.
- They double-click some program/script you have given them.
- They delete a file in S3 -- for example, if they use Cyber Duck to upload files with a nice user interface, they could delete a "HOLD" file, which would then trigger the Lambda function.
Lots of ways, depending on how many clients you have, the technical skills of the client, whether you want to issue them with AWS credentials and how they upload the files.