Home > OS >  How to read file in s3 directory only knowing file extension using boto3
How to read file in s3 directory only knowing file extension using boto3

Time:09-30

I have a spark workflow which outputs a csv file into its own directory with a randomized filename and a few accessory files which are not .csv files. I need to read that file in through a separate python workflow, which if I knew the exact filename I would use:

bucket = "bucketName"
file_name = "/user/myName/output/date/dataset/file_name.csv"
s3 = boto3.client('s3') 
obj = s3.read_object(Bucket= bucket, Key= file_name) 

Since I dont know the exact file name, what I need to do is simply read the only file in that s3 path that has a .csv extension

CodePudding user response:

You will need to provide the exact Key to S3 to access the object.

Therefore, you will first need to list the contents of the bucket. Here's some code that can prints the name of the first CSV object in a given directory.

import boto3

s3 = boto3.client('s3')

response = s3.list_objects_v2(Bucket=bucket,Prefix='folder1/')

objects = [object['Key'] for object in response['Contents'] if object['Key'].endswith('.csv')]

if len(objects) > 0:
    print(objects[0])
  • Related