I have lots of files in a "source" S3 bucket and I want to copy them to "dest" bucket. But at the same time new files are put into the "source" bucket. The question is, will the paginator see new uploaded files? And if not how could I track them for paginating again is costly?
My code (using aws-sdk-go-v2):
paginator := s3.NewListObjectsV2Paginator(client, &s3.ListObjectsV2Input{
Bucket: bucket,
})
for paginator.HasMorePages() {
page, err := paginator.NextPage(ctx)
if err != nil {
log.Errorf("error: % v", err)
return
}
for _, obj := range page.Contents {
// copy object
}
}
CodePudding user response:
As mentioned in the comments you should definitely test your code but...
it's worth mentioning that since December 2020 S3 is Strong Read-After-Write Consistency.
In other words,
all S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent. What you write is what you will read, and the results of a LIST will be an accurate reflection of what’s in the bucket. This applies to all existing and new S3 objects, works in all regions [...]
More on that in this blog post.
And according to the AWS GO SDK
Pagination methods iterate over a list operation until the method retrieves the last page of results or until the callback function returns false
CodePudding user response:
If you're simply reading the files and writing them to a second bucket, but otherwise leaving them the same content, S3 can do this for you with bucket replication.
If you need to do some processing beyond what S3's built-in replication can do, the best way is with event bridge, which will automatically handle incoming objects. For resiliency, I recommend connecting the event bridge to SQS and the SQS to a lambda function. You can then run an idempotent version of your syncing program to pick up any objects that came in before the event bridge was set up. Find some way to know whether the object was replicated or not (object tags might help here), and you can run it until all objects are synced.
Either way, new objects will be synced in the future.