Home > Back-end >  s3sync() Exclude Directory
s3sync() Exclude Directory

Time:10-12

I'm trying to pull down all files in a given bucket, except those in a specific directory, using R.

In the aws cli, I can use...
aws s3 sync s3://my_bucket/my_prefix ./my_destination --exclude="*bad_directory*"

In aws.s3::s3sync(), I'd like to do something like...
aws.s3::s3sync(path='./my_destination', bucket='my_bucket', prefix='my_prefix', direction='download', exclude='*bad_directory*')
...but exclude is not a supported argument.

Is this possible using aws.s3 (or paws for that matter)?

Please don't recommend using aws cli - there are reasons that approach doesn't make sense for my purpose.

Thank you!!

CodePudding user response:

Here's what I came up with to solve this...

library(paws)
library(aws.s3)

s3 <- paws::s3()
contents <- s3$list_objects(Bucket='my_bucket',Prefix='my_prefix/')$Contents
      
keys <- unlist(sapply(contents,FUN=function(x){
    if(!grepl('/bad_directory/',x$Key,fixed=TRUE)){
        x$Key
    }
}))
      
for(i in keys){
    dir.create(dirname(i),showWarnings=FALSE,recursive=TRUE)
        
    aws.s3::save_object(
        object = i,
        bucket='my_bucket',
        file = i
    )
}

Still open to more efficient implementations - thanks!

  • Related