Home > Blockchain >  Python to compare two csv or excel files and print custom output
Python to compare two csv or excel files and print custom output

Time:11-15

I am trying to compare two CSV files and print the differences in Python as a custom text file.

For example:

CSV 1:

Id, Customer, Status, Date
01, ABC, Good, Mar 2023
02, BAC, Good, Feb 2024
03, CBA, Bad, Apr 2022

CSV 2:

Id, Customer, Status, Date
01, ABC, Bad, Mar 2023
02, BAC, Good, Feb 2024
03, CBA, Good, Apr 2024

Expected Output:

Id 01 Status is changed to Bad
Id 03 Status is changed to Good
Id 03 Date changed is to Apr 2024

Any suggestion/idea to code for the expected output. Thanks

CodePudding user response:

I take it from data frames you are using pandas?

Pandas text export is easy enough as per the link to https://stackoverflow.com/a/31247247/16367225 posted in the comment to your question.

Get whatever result you want in a one-row-per-one-df format (lookup pandas merge if you are unclear on how joins between frames work), and then do something like this: How to iterate over rows in a DataFrame in Pandas which describes how to iterate over rows.

Make a variable for what changed, e.g, StatusChange.

Then iterate over it to make your output

Then that particular row might look something like:

    f.writelines(f'Id {row['id']} Status is changed to {row['status']})

If you combine that sort of iterate over one df approach with the text output from the comment you should be right

CodePudding user response:

There is a method in pandas which could help you in doing that:

df.compare will compare 2 different dataframes and return which values changed in each column for the data records

data1 = [[1, 'ABC', 'Good', 'Mar 2023'], [2, 'BAC', 'Good', 'Feb 2024'], [3, 'CBA', 'Bad', 'Apr 2022']]
df1 = pd.DataFrame(data1, columns=['Id', 'Customer', 'Status', 'Date'])

data2 = [[1, 'ABC', 'Bad', 'Mar 2023'], [2, 'BAC', 'Good', 'Feb 2024'], [3, 'CBA', 'Good', 'Apr 2024']]
df2 = pd.DataFrame(data2, columns=['Id', 'Customer', 'Status', 'Date'])

print("Dataframe difference -- \n")
print(df1.compare(df2))

print("Dataframe difference keeping equal values -- \n")
print(df1.compare(df2, keep_equal=True))

Output:

Dataframe difference -- 

  Status            Date          
    self other      self     other
0   Good   Bad       NaN       NaN
2    Bad  Good  Apr 2022  Apr 2024

Dataframe difference keeping equal values -- 

  Status            Date          
    self other      self     other
0   Good   Bad  Mar 2023  Mar 2023
2    Bad  Good  Apr 2022  Apr 2024
  • Related