Home > Enterprise >  Both csv file are identical yet I get mismatch error in my python code output
Both csv file are identical yet I get mismatch error in my python code output

Time:04-11

I have written a script which basically compares two excel file and return mismatch error when it happens.

So, below is my script:

import pandas as pd
def main():
    sheet1 = pd.read_csv(filepath_or_buffer = '1.csv')
    sheet2 = pd.read_csv(filepath_or_buffer = '2.csv')
    # Iterating the Columns Names of both Sheets
    for i,j in zip(sheet1,sheet2):

        # Creating empty lists to append the columns values
        a,b =[],[]

        # Iterating the columns values
        for m, n in zip(sheet1[i],sheet2[j]):

            # Appending values in lists
            a.append(m)
            b.append(n)

        # Sorting the lists
        a.sort()
        b.sort()

        # Iterating the list's values and comparing them
        for m, n in zip(range(len(a)), range(len(b))):
            if a[m] != b[n]:
                print('Column name : \'{}\' and Row Number : {}'.format(i,m))




if __name__ == '__main__':
    main()

My output is:

Column name : 'PL' and Row Number : 0
Column name : 'PL' and Row Number : 1
Column name : 'PL' and Row Number : 2

FYI: in both excel file, PL column contains 'null' value, still it throws the mismatch error.

Can anyone help me to pinpoint how to debug?

CodePudding user response:

You should probably check that both values are not np.nan as well:

if a[m] != b[n] and (a[m] is not np.nan or b[n] is not np.nan):
    print('Column name : \'{}\' and Row Number : {}'.format(i,m))

CodePudding user response:

You should probably check this out: Pandas DataFrame Compare

However, this works only for DF with same labels.

  • Related