Python - Compare two files and write changed rows and new rows to a new file-CodePudding

Using Python (Pandas being acceptable) I need to compare file2 to file1 and output to file3 the rows from file2 that are either:

Present in file1 but have some changed values
Not present in file1

file1

ID,Format,Output,FontSize,isDefault
1,LP,LETTER,14,False
2,LP,LETTER,16,False
3,LP,LETTER,18,True
4,LP,LETTER,20,False
5,LP,LETTER,22,False

file2

ID,Format,Output,FontSize,isDefault
2,LP,LETTER,16,False
3,AU,LETTER,18,False
4,BR,LETTER,20,False
5,LP,LETTER,22,False
6,LP,LEDGER,24,False
7,LP,LEDGER,36,False

file3

ID,Format,Output,FontSize,isDefault
3,AU,LETTER,18,False
4,BR,LETTER,20,False
6,LP,LEDGER,24,False
7,LP,LEDGER,36,False

In this example the ID column is unique.

CodePudding user response：

Assuming that the files are CSVs, and that the IDs are simply the row numbers, you could get a csv.reader() to get lists of the rows from the two files. Then you loop over them to see if the same row exists in both files; if not, you add them to a new list:

new_list = []
number_of_rows = min(len(list_from_file1), len(list_from_file2))
for i in range(number_of_rows):
    if list_from_file1[i] != list_from_file2[i]:
        new_list.append(list_from_file2[i])

That will copy all rows that exist in both files but with different content into new_list. You can then also copy all remaining rows from the longer file into new_list too, and write new_list into file3 with csv.writer.

CodePudding user response：

Assuming you are reading csv files, try this code

import pandas as pd
import numpy as np
import csv

#Instead of below 2 lines of code, read data from files using the link at the end of code, below.
df1 = pd.DataFrame([[1,'LP','LETTER',14,False],[2,'LP','LETTER',16,False]], columns =['ID','Format','Output','FontSize','isDefault'])
df2 = pd.DataFrame([[2,'LP','LETTER',16,False],[3,'AU','LETTER',18,False]], columns =['ID','Format','Output','FontSize','isDefault'])

result = []

for index, row_df2 in df2.iterrows():
    if row_df2['ID'] in df1.ID.to_list():
        row_df1 = df1[df1.ID == row_df2['ID']].iloc[0] 
        if np.array_equal(row_df2.values,row_df1.values):
            result.append(row_df2.to_list()   )
    else:
        result.append(row_df2.to_list()  )

#Write to CSV file
with open('file3.csv', 'w', newline='') as myfile:
     wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
     wr.writerow(['ID','Format','Output','FontSize','isDefault'])
     for val in result:
        wr.writerow(val)

Import CSV file as a pandas DataFrame