Home > Software engineering >  Python read file from S3 by generate_presigned_url
Python read file from S3 by generate_presigned_url

Time:12-18

I have two files in my S3 and I need to read them from Python code outside of AWS. To do that I am generating presigned url for each to be able to read them. The first part of code which is reading the file with tx format is working fine:

 response = client.generate_presigned_url('get_object',
                                                    Params={'Bucket': ...,
                                                            'Key': 'file.csv'},
                                                    ExpiresIn=3600)

df = pd.read_csv(response)

However, I got an error for the second file which has PRF format:

response = client.generate_presigned_url('get_object',
                                                    Params={'Bucket': ...,
                                                            'Key': 'finlename.RRF'},
                                                   ExpiresIn=3600)
with open(response , encoding="utf8") as fp:
       for cnt, line in enumerate(fp):
             line=line.strip()

here is the error:

---> 28     with open(file_name, encoding="utf8") as fp:
     29         for cnt, line in enumerate(fp):
     30             line=line.strip()

OSError: [Errno 22] Invalid argument: 'https://....'

I am wondering if it is related to the file format as I can read any cv or text file without any issue with pandas read_csv but for any other format when using open I got the same error.

CodePudding user response:

The file format does make the difference, but only indirectly. Read the documentation of pandas.read_csv:

Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file.

You pass an HTTPS URL to read_csv and the method is able to handle it.

However python's open method expects a file object and can't handle URLs. You need to use an HTTP client library to download the file and then process it.

  • Related