I need to write a python script that outputs the differences of two csvs into a third csv based on the specific date format, the third csv will hold the differences that are between the two files
#reads both files and puts them into a table
Id = "ID"
Date = "Date"
with open('example.csv', 'r') as t1, open ('example2.csv', 'r') as t2:
t1.write(Id Date "\n")
t1.close()
t2.write(Id Date "\n")
t2.close()
fileone = t1.readlines()
filetwo = t2.readlines()
#function to write a third file that outputs differences
with open ('DIFF.csv', 'w') as outfile:
for line in filetwo:
if line not in fileone:
#wr = csv.writer(outfile, dialect='csv')
#wr.writerow([line.rstrip('\n')])
outfile.write(line)
outfile.close()
print("csv is ready")
CodePudding user response:
If I got this question right, you have 2 files with date in a particular format listed like this (I'll use my local format, but you can specify the format in the code) :
example.csv
20/07/2022 15:01
20/07/2022 15:02
20/07/2022 15:03
And:
example2.csv
20/07/2022 14:02
20/07/2022 15:01
20/07/2022 15:08
You want to retreive the symmetric difference (date that are on one file but not on the other one) of these files in term of date :
output
20/07/2022 15:03
20/07/2022 15:08
20/07/2022 14:02
20/07/2022 15:02
To do so here's the code :
from datetime import datetime
#write the format you desire
my_format = "%d/%m/%Y %H:%M\n"
#function that apply to each line to transform the str to a datetime object
str_to_datetime = lambda line: datetime.strptime(line, my_format)
with open('example.csv', 'r') as t1, open ('example2.csv', 'r') as t2, open ('DIFF.csv', 'w') as outfile:
first_set, second_set = set(map(str_to_datetime, t1.readlines())), set(map(str_to_datetime, t2.readlines()))
for date in first_set ^ second_set:
outfile.write(date.strftime(my_format))