I have written a script which basically compares two excel file and return mismatch error when it happens.
So, below is my script:
import pandas as pd
def main():
sheet1 = pd.read_csv(filepath_or_buffer = '1.csv')
sheet2 = pd.read_csv(filepath_or_buffer = '2.csv')
# Iterating the Columns Names of both Sheets
for i,j in zip(sheet1,sheet2):
# Creating empty lists to append the columns values
a,b =[],[]
# Iterating the columns values
for m, n in zip(sheet1[i],sheet2[j]):
# Appending values in lists
a.append(m)
b.append(n)
# Sorting the lists
a.sort()
b.sort()
# Iterating the list's values and comparing them
for m, n in zip(range(len(a)), range(len(b))):
if a[m] != b[n]:
print('Column name : \'{}\' and Row Number : {}'.format(i,m))
if __name__ == '__main__':
main()
My output is:
Column name : 'PL' and Row Number : 0
Column name : 'PL' and Row Number : 1
Column name : 'PL' and Row Number : 2
FYI: in both excel file, PL column contains 'null' value, still it throws the mismatch error.
Can anyone help me to pinpoint how to debug?
CodePudding user response:
You should probably check that both values are not np.nan
as well:
if a[m] != b[n] and (a[m] is not np.nan or b[n] is not np.nan):
print('Column name : \'{}\' and Row Number : {}'.format(i,m))
CodePudding user response:
You should probably check this out: Pandas DataFrame Compare
However, this works only for DF with same labels.