I'm trying to pull down all files in a given bucket, except those in a specific directory, using R.
In the aws cli, I can use...
aws s3 sync s3://my_bucket/my_prefix ./my_destination --exclude="*bad_directory*"
In aws.s3::s3sync(), I'd like to do something like...
aws.s3::s3sync(path='./my_destination', bucket='my_bucket', prefix='my_prefix', direction='download', exclude='*bad_directory*')
...but exclude is not a supported argument.
Is this possible using aws.s3 (or paws for that matter)?
Please don't recommend using aws cli - there are reasons that approach doesn't make sense for my purpose.
Thank you!!
CodePudding user response:
Here's what I came up with to solve this...
library(paws)
library(aws.s3)
s3 <- paws::s3()
contents <- s3$list_objects(Bucket='my_bucket',Prefix='my_prefix/')$Contents
keys <- unlist(sapply(contents,FUN=function(x){
if(!grepl('/bad_directory/',x$Key,fixed=TRUE)){
x$Key
}
}))
for(i in keys){
dir.create(dirname(i),showWarnings=FALSE,recursive=TRUE)
aws.s3::save_object(
object = i,
bucket='my_bucket',
file = i
)
}
Still open to more efficient implementations - thanks!