I've created a lambda that scans my s3 bucket and collects some metadata for each object found in the S3. However, I am hitting a roadblock when exporting a CSV with the data of the s3 object. My CSV only returns one record, how can I get my CSV to return all objects?
Please see my Lambda code below:
import re
import datetime
from datetime import date
import os
import math
import csv
s3 = boto3.client('s3')
logger = logging.getLogger()
logger.setLevel(logging.INFO)
time=date.today().strftime("%d/%m/%Y")
def lambda_handler(event, context):
s3_resource = boto3.resource('s3')
result = []
bucket = s3_resource.Bucket('dev-bucket')
key='csv_file.csv'
for object in bucket.objects.all():
name=object.key
size=object.size
si=list(name)
dates=object.last_modified.strftime("%d/%m/%Y")
owner=object.owner['DisplayName']
days_since_creation= datetime.datetime.strptime(time, "%d/%m/%Y") - datetime.datetime.strptime(dates, "%d/%m/%Y")
days_since_creation=days_since_creation.days
to_delete =[]
if days_since_creation >= 30:
to_delete = 'Y'
else:
to_delete = 'N'
myfile = open("/tmp/csv_file.csv", "w ")
writer = csv.writer(myfile,delimiter='|')
rows = name, size, dates, days_since_creation
rows=list(rows)
writer.writerow(rows)
myfile.close()
#upload the data into s3
s3.upload_file('/tmp/csv_file.csv', 'dev-bucket', 'cleanuptest.csv')
print(rows)
My Current output is this below:
09ff0687-a644-4d5e-9de8-277594b194a6.csv.metadata|280|29/11/2021|78
The preferred output would be:
0944ee8b-1e17-496a-9196-0caed1e1de11.csv.metadata|152|08/12/2021|69
0954d7e5-dcc6-4cb6-8c07-70cbf37a73ef.csv|8776432|16/11/2021|91
0954d7e5-dcc6-4cb6-8c07-70cbf37a73ef.csv.metadata|336|16/11/2021|91
0959edc4-fa02-493f-9c05-9040964f4756.csv|6338|29/11/2021|78
0959edc4-fa02-493f-9c05-9040964f4756.csv.metadata|225|29/11/2021|78
0965cf32-fc31-4acc-9c32-a983d8ea720d.txt|844|10/12/2021|67
0965cf32-fc31-4acc-9c32-a983d8ea720d.txt.metadata|312|10/12/2021|67
096ed35c-e2a7-4ec4-8dae-f87b42bfe97c.csv|1761|09/12/2021|68
Unfortunately, I cannot get it right, I'm not sure what I'm doing wrong. Help would be appreciated
CodePudding user response:
I think in your current setup, you open and close the file for each row. So, basically, at the end your file will have the last row.
What you probably want is this:
myfile = open("/tmp/csv_file.csv", "w ")
for object in bucket.objects.all():
<the looping logic>
myfile.close()
s3.upload_file('/tmp/csv_file.csv', 'dev-bucket', 'cleanuptest.csv')
You can prove that opening & closing the file each time rewrites the file by running the below minimal version of your script:
import csv
myfile1 = open("csv_file.csv", "w ")
writer1 = csv.writer(myfile1,delimiter='|')
row1 = "a", "b", "c"
rows1 = list(row1)
writer1.writerow(rows1)
myfile1.close()
print(rows1)
myfile2 = open("csv_file.csv", "w ")
writer2 = csv.writer(myfile2,delimiter='|')
row2 = "x", "y", "z"
rows2 = list(row2)
writer2.writerow(rows2)
myfile2.close()
print(rows2)
Output in file:
x|y|z
CodePudding user response:
FYI you can also open the file in append mode using a
to ensure the rows are not overwritten.
myfile = open("/tmp/csv_file.csv", "a")
Using w
has the below caveat as mentioned in the docs:
'w' for only writing (an existing file with the same name will be erased)
'a' opens the file for appending; any data written to the file is automatically added to the end.
...