I have a python function to download csv files from AWS S3 Bucket. The folder I want to download from has a lot of csv files with various naming conventions and out of all those, I want to download files that contain a certain substring.
The files I want to download are named as:
BANK_NIFTY_5MINs_2020-01-01.csv
BANK_NIFTY_5MINs_2020-01-02.csv
BANK_NIFTY_5MINs_2020-01-03.csv
and so on.
I do not want to download all the csv files from the folder of 2020, just the csv files that have the substring. Can someone please help on how I can do that?
The below code is where I run the function but this does not download the data:
download_from_s3(s3_uri="s3://dir1/dir2/dir3/2020/BANK_NIFTY_5MINs*.csv", local_dir=os.path.join("2020Data"))
How can I specify the substring of the csv files I want to download?
CodePudding user response:
There is no command in Amazon S3 to download objects via a wildcard. At some stage, your code would need to make an API call to S3 with the exact name (Key) of the object you want to download.
Therefore, your code would need to call list_objects_v2()
to obtain a listing of objects in the S3 bucket. Then, you can use string comparison logic in Python to determine which objects you want to download, and call download_file()
for each of them.