Home > Blockchain >  AWS S3 downloads all from a bucket when I specify what to download specifically. Behaves different i
AWS S3 downloads all from a bucket when I specify what to download specifically. Behaves different i

Time:12-06

So I'm trying to execute this (obfuscated for security) command:

aws s3 cp s3://bucket/subfolder/ /storage/ --recursive --exclude '*' --include 'a.data' --include 'b.data' --include 'c.data' .... and so on.

When I run this from the command line, everything works as expected.

However, when I run a bash shell that should run that command, aws tries to download all the files in that subfolder. I have checked with ps and found the exact command being used!

ubuntu   1761765  114  2.3 1206204 93252 pts/3   Sl   18:47   0:06 /usr/bin/python3 /usr/bin/aws s3 cp s3://buckt/subfolder/ /storage/ --recursive --exclude '*' --include 'a.data' --include 'b.data' --include 'c.data' ....

I get the same thing even when I simply run this:

ubuntu   1761765  114  2.3 1206204 93252 pts/3   Sl   18:47   0:06 /usr/bin/python3 /usr/bin/aws s3 cp s3://buckt/subfolder/ /storage/ --recursive --exclude '*'

Anyone have any idea what's going on here? It's like its ignoring everything after --recursive.

I have tried modifying my command, using ps to find what is actually being executed, and checking to make sure the correct user is running the command.

I am ultimately trying to build a long --include 'filename' string to download many files at once. Using bash for loops is way too slow.

Edit: Here is my bash script more or less:

includeList="--exclude '*' --recursive "
while [ $i -ne $cnt ] # while i != count
do
    #download the ith files
    f=${allFiles[$i]}
    includeList="${includeList}--include '$f' "
    i=$(( $i   1 ))
    mod=$(($i))
    if [ $mod -eq 0 ]; then
       aws s3 cp s3://bucket/$1/ /storage/ ${includeList}
       exit 0
    fi
 done

CodePudding user response:

--exclude and --include options are filters that are applied to the files that are being copied, and they only have an effect if they are used after the --recursive option. dryrun will simulate to check it without copying.

aws s3 cp s3://buckt/subfolder/ /storage/ --recursive --exclude '*' --include 'a.data' --include 'b.data' --include 'c.data' --dryrun

CodePudding user response:

The * that you're passing to the AWS CLI is not meant to be globed by the shell. Normally, quoting it is enough, but since you're including it in a variable, it gets a bit more complicated:

# Only quote the asterisk once here
includeList="--exclude * --recursive "
while [ $i -ne $cnt ] # while i != count
do
    #download the ith files
    f=${allFiles[$i]}
    includeList="${includeList}--include $f "
    i=$(( $i   1 ))
    mod=$(($i))
    if [ $mod -eq 0 ]; then
        # Disable globbing explicitly, call aws, then turn globbing back on:
        set -f
        aws s3 cp s3://bucket/$1/ /storage/ ${includeList}
        set  f
        exit 0
    fi
 done
  • Related