Home > OS >  Reading a subset of csv files from S3 bucket using lambda and boto3
Reading a subset of csv files from S3 bucket using lambda and boto3

Time:03-02

In my s3 bucket I have around 30 csv files, classified into 3 categories. With my lambda I am interested to pick only 8 of them which belong to category 1. I had used the response from the next question: Reading multiple csv files from S3 bucket with boto3

so I formulated the next code:

def read_prefix_to_df(prefix,s3_resource,bucket_name):
    bucket = s3_resource.Bucket(bucket_name)
    prefix_objs = bucket.objects.filter(Prefix=prefix)
    prefix_df = []
    for obj in prefix_objs:
        key = obj.key
        body = obj.get()['Body'].read()
        df = pd.DataFrame(body)
        prefix_df.append(df)
    return prefix_df

Where :

bucket_name='my_bucket'
prefix='folder/data_overview_*.csv'

all the 8 files have almost the same name except the date at the end that's why I used the * to pick all files related to data_overview_ Unfortunately, the returned dataframe was empty, shall I change the prefix?

CodePudding user response:

Prefixes cannot contain wildcard characters.

You should use:

prefix = 'folder/data_overview_`

If you need to further limit to only CSV files, then you will need to do that with an if statement within your Python code.

  • Related