I have already thousands of objects in my S3 bucket. I have written a lambda function now which process them and it gets triggered when a file gets dropped in that S3 bucket. I would like to copy some objects with matching pattern and drop them in the same bucket to trigger my lambda. Currently, I am following below method which takes a lot of time.
import boto3, botocore
s3_source = boto3.resource('s3')
bucket_source = s3_source.Bucket('vistradata')
key_list = []
objs = list(bucket_source.objects.filter(Prefix='data/'))
for i in range(0, len(objs)):
key_list.append(objs[i].key)
files = [i for i in key_list if 'mystring' in i]
def copy_data_from_s3(input_file):
s3 = boto3.resource('s3')
copy_source = {
'Bucket': 'bucket',
'Key': input_file
}
s3.meta.client.copy(copy_source, 'bucket', input_file)
for i in files:
copy_data_from_s3(i)
Is there any better method using aws s3 sync or aws s3 cp? The examples I see online are copying data from one bucket to another and not in the same bucket. Thank you.
CodePudding user response:
Yes, you could run a command like this to force the notification to trigger.
aws s3 sync s3://mybucket/* s3://mybucket/folder/
That would copy all the files inside the bucket to a new folder inside the bucket and trigger notifications for each.
You could also run that first with notifications disabled, then run it in reverse if need be.
CodePudding user response:
You could skip the S3 copying altogether. Your existing for i in range
loop can invoke your notification lambda directly for each file. This is what S3 does. Your event payload would be a stripped-down version of the S3 notification event with only the bucket
, key
or whatever fields you need.
This will be faster and cheaper than copying S3 objects, but who cares? If this is a one-time operation, your or @Coin-Graham approach will also get the job done.