I have two files in my S3 and I need to read them from Python code outside of AWS. To do that I am generating presigned url for each to be able to read them. The first part of code which is reading the file with tx format is working fine:
response = client.generate_presigned_url('get_object',
Params={'Bucket': ...,
'Key': 'file.csv'},
ExpiresIn=3600)
df = pd.read_csv(response)
However, I got an error for the second file which has PRF format:
response = client.generate_presigned_url('get_object',
Params={'Bucket': ...,
'Key': 'finlename.RRF'},
ExpiresIn=3600)
with open(response , encoding="utf8") as fp:
for cnt, line in enumerate(fp):
line=line.strip()
here is the error:
---> 28 with open(file_name, encoding="utf8") as fp:
29 for cnt, line in enumerate(fp):
30 line=line.strip()
OSError: [Errno 22] Invalid argument: 'https://....'
I am wondering if it is related to the file format as I can read any cv or text file without any issue with pandas read_csv but for any other format when using open I got the same error.
CodePudding user response:
The file format does make the difference, but only indirectly. Read the documentation of pandas.read_csv:
Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file.
You pass an HTTPS URL to read_csv and the method is able to handle it.
However python's open method expects a file object and can't handle URLs. You need to use an HTTP client library to download the file and then process it.