Both csv file are identical yet I get mismatch error in my python code output-CodePudding

I have written a script which basically compares two excel file and return mismatch error when it happens.

So, below is my script:

import pandas as pd
def main():
    sheet1 = pd.read_csv(filepath_or_buffer = '1.csv')
    sheet2 = pd.read_csv(filepath_or_buffer = '2.csv')
    # Iterating the Columns Names of both Sheets
    for i,j in zip(sheet1,sheet2):

        # Creating empty lists to append the columns values
        a,b =[],[]

        # Iterating the columns values
        for m, n in zip(sheet1[i],sheet2[j]):

            # Appending values in lists
            a.append(m)
            b.append(n)

        # Sorting the lists
        a.sort()
        b.sort()

        # Iterating the list's values and comparing them
        for m, n in zip(range(len(a)), range(len(b))):
            if a[m] != b[n]:
                print('Column name : \'{}\' and Row Number : {}'.format(i,m))




if __name__ == '__main__':
    main()

My output is:

Column name : 'PL' and Row Number : 0
Column name : 'PL' and Row Number : 1
Column name : 'PL' and Row Number : 2

FYI: in both excel file, PL column contains 'null' value, still it throws the mismatch error.

Can anyone help me to pinpoint how to debug?

CodePudding user response：

You should probably check that both values are not np.nan as well:

if a[m] != b[n] and (a[m] is not np.nan or b[n] is not np.nan):
    print('Column name : \'{}\' and Row Number : {}'.format(i,m))

CodePudding user response：

You should probably check this out: Pandas DataFrame Compare

However, this works only for DF with same labels.