Convert txt file to csv with lambda s3 Aws-CodePudding

I have this code that must pass a file from a TXT type source bucket and must convert it to CSV in a destination bucket, it returns as a response that the variable or object (z) that should contain the CSV file cannot be opened because it is null. It seems that the code that I use is not transforming the object correctly. Please, I need help to correct it.

The code is the following:

import pandas as pd import json import boto3 from io import BytesIO

def lambda_handler(evenBytesIOt,context):

s3_resource = boto3.resource('s3')
source_bucket = 'testsigma2'
target_bucket = 'testsigma3'

my_bucket = s3_resource.Bucket(source_bucket)

for file in my_bucket.objects.all():
    if(str(file.key).endswith('.txt')):
        
       zip_obj = s3_resource.Object(bucket_name=source_bucket, key=file.key)
       
       buffer= BytesIO(zip_obj.get()['Body'].read())
       
       dataframe1=pd.read_csv(buffer)
       z = dataframe1.to_csv(buffer,index=None) 
       
       response = s3_resource.meta.client.upload_fileobj(
                    z.open(filename),
                    Bucket = target_bucket,
                    key = f'{filename}'

                )

    else:
        print(file.key   'is not a zip file.')

Response
{
  "errorMessage": "'NoneType' object has no attribute 'open'",
  "errorType": "AttributeError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 25, in lambda_handler\n    z.open(filename),\n"
  ]
}

CodePudding user response：

It looks like you are trying to open the z object after calling the to_csv method, but the to_csv method does not return a file object. Instead, it writes the CSV data directly to the buffer object that you provided as an argument. You can confirm this by calling the seek method on the buffer object after calling to_csv to reset the position of the file pointer to the beginning of the file:

dataframe1=pd.read_csv(buffer)
z = dataframe1.to_csv(buffer,index=None) 

//Reset the position of the file pointer to the beginning of the file
buffer.seek(0)

response = s3_resource.meta.client.upload_fileobj(
             buffer,
             Bucket = target_bucket,
             key = f'{filename}'
          )

You can then use the buffer object as the file object to be uploaded to S3.

CodePudding user response：

The to_csv method on a pandas dataframe doesn't return the buffer, instead it returns none and writes to the buffer buffer. Thus when it returns None and you try and open filename it will error. try passing the buffer to the upload_fileobj. Additionally, I don't think filename is defined anywhere so be aware of that.

For documentation on the specific resources your using, check this out: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html df