I have this code that must pass a file from a TXT type source bucket and must convert it to CSV in a destination bucket, it returns as a response that the variable or object (z) that should contain the CSV file cannot be opened because it is null. It seems that the code that I use is not transforming the object correctly. Please, I need help to correct it.
- The code is the following:
import pandas as pd import json import boto3 from io import BytesIO
def lambda_handler(evenBytesIOt,context):
s3_resource = boto3.resource('s3')
source_bucket = 'testsigma2'
target_bucket = 'testsigma3'
my_bucket = s3_resource.Bucket(source_bucket)
for file in my_bucket.objects.all():
if(str(file.key).endswith('.txt')):
zip_obj = s3_resource.Object(bucket_name=source_bucket, key=file.key)
buffer= BytesIO(zip_obj.get()['Body'].read())
dataframe1=pd.read_csv(buffer)
z = dataframe1.to_csv(buffer,index=None)
response = s3_resource.meta.client.upload_fileobj(
z.open(filename),
Bucket = target_bucket,
key = f'{filename}'
)
else:
print(file.key 'is not a zip file.')
Response
{
"errorMessage": "'NoneType' object has no attribute 'open'",
"errorType": "AttributeError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 25, in lambda_handler\n z.open(filename),\n"
]
}
CodePudding user response:
It looks like you are trying to open the z object after calling the to_csv method, but the to_csv method does not return a file object. Instead, it writes the CSV data directly to the buffer object that you provided as an argument. You can confirm this by calling the seek method on the buffer object after calling to_csv to reset the position of the file pointer to the beginning of the file:
dataframe1=pd.read_csv(buffer)
z = dataframe1.to_csv(buffer,index=None)
//Reset the position of the file pointer to the beginning of the file
buffer.seek(0)
response = s3_resource.meta.client.upload_fileobj(
buffer,
Bucket = target_bucket,
key = f'{filename}'
)
You can then use the buffer object as the file object to be uploaded to S3.
CodePudding user response:
The to_csv
method on a pandas dataframe doesn't return the buffer, instead it returns none and writes to the buffer buffer
. Thus when it returns None
and you try and open filename
it will error. try passing the buffer
to the upload_fileobj
. Additionally, I don't think filename
is defined anywhere so be aware of that.
For documentation on the specific resources your using, check this out: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html df