I have a Python 3.8 script running in AWS Lambda.
It is supposed to read a file from S3 line by line with csv.reader(data_in, delimiter=',') and write line by line to another CSV file in the same bucket.
Apparently it works fine, but when I look at the output file, it always has less lines.
Here is the script:
import json
import os
import io
import boto3
import csv
import time
def lambda_handler(event, context):
s3 = boto3.resource(u's3')
s3_object_in = s3.Object('MYBUCKET', 'INPUT.csv')
data_in = s3_object_in.get()['Body'].read().decode('utf-8').splitlines()
lines = csv.reader(data_in, delimiter=',')
lambda_path = "/tmp/temp.csv"
with open(lambda_path, 'w ', encoding="utf-8") as file:
i_in=0
for line in lines:
file.write(str(i_in) '\n')
i_in = 1
s3.Bucket('MYBUCKET').upload_file(lambda_path, 'out.csv')
file.close()
s3_object_out = s3.Object('MYBUCKET', 'out.csv')
data_out = s3_object_out.get()['Body'].read().decode('utf-8').splitlines()
lines_out = csv.reader(data_out)
i_out=0
for line in lines_out:
i_out = 1
return {
'count_in': i_in,
'count_out': i_out
}
This code returns the following response when tested:
Response
{
"count_in": 25428,
"count_out": 25057
}
So, by looking at the i_in, the code is clearly going until the last line of the input file (which has 25428 lines indeed). But the file.write function is stopping in the 25057th line.
The output file is written with a continuous counter starting at 0 and finishing at 25056
Any ideas?
I am running with 1024 MB RAM, 10 minutes timeout
CodePudding user response:
The following code has two problems:
with open(lambda_path, 'w ', encoding="utf-8") as file:
i_in=0
for line in lines:
file.write(str(i_in) '\n')
i_in = 1
s3.Bucket('MYBUCKET').upload_file(lambda_path, 'out.csv')
file.close()
Specifically, the two problems are:
- The file is being uploaded to S3 while still inside the
with
context manager, so the file might not be fully written to disk - The
with
context manager will automatically close the file, sofile.close()
is not needed
The code should be written like this:
with open(lambda_path, 'w ', encoding="utf-8") as file:
i_in=0
for line in lines:
file.write(str(i_in) '\n')
i_in = 1
s3.Bucket('MYBUCKET').upload_file(lambda_path, 'out.csv')
See Context Managers for more details.