Home > Software design >  AWS S3 download all files with same name with shell
AWS S3 download all files with same name with shell

Time:10-13

There are files from an AWS s3 bucket that I would like to download, they all have the same name but are in different subfolders. There are no credentials required to download and connect to this bucket. I would like to download all the files called "B01.tif" in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/, and save them with the name of the subfolder they are in (for example: S2A_7VEG_20170205_0_L2AB01.tif).

Path example:

s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/S2A_7VEG_20170205_0_L2A/B01.tif

I was thinking of using a bash script that prints the output of ls to download the file with cp, and save it on my pc with a name generated from the path.

Command to use ls:

aws s3 ls s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/ --no-sign-request

Command to download a single file:

aws s3 cp s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/S2A_7VEG_20170205_0_L2A/B01.tif --no-sign-request B01.tif

Attempt to download multiple files:

VAR1=B01.tif
for a in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/:    
  for b in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/:
    for c in s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/2017/2/:
    
       NAME=$(aws s3 ls s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/$a$b$c | head -1)
       
       aws s3 cp s3://sentinel-cogs/sentinel-s2-l2a-cogs/7/V/EG/$NAME/B01.tif --no-sign-request $NAME$VAR1
    
    done
  done
done

I don't know if there is a simple way to go automatically through every subfolder and save the files directly. I know my ls command is broken, because if there are multiple subfolders it will only take the first one as a variable.

CodePudding user response:

It's easier to do this in a programming language rather than as a Shell script.

Here's a Python script that will do it for you:

import boto3

BUCKET = 'sentinel-cogs'
PREFIX = 'sentinel-s2-l2a-cogs/7/V/EG/'
FILE='B01.tif'

s3_resource = boto3.resource('s3')

for object in s3_resource.Bucket(BUCKET).objects.filter(Prefix=PREFIX):
    if object.key.endswith(FILE):
        target = object.key[len(PREFIX):].replace('/', '_')
        object.Object().download_file(target)
  • Related