Home > Blockchain >  Get the index numbers of mismatching column values between to dataframes
Get the index numbers of mismatching column values between to dataframes

Time:10-07

I have two similar dataframes like this

Dataframe 1:

ID        classification
1         MISS
2         MISS
3         CORRECT
4         MISS
5         CORRECT

Dataframe 2:

ID        classification
1         CORRECT
2         CORRECT
3         MISS
4         MISS
5         CORRECT

I would like get the index numbers for each time there is a mismatch between values in the classification column between dataset 1 and dataset 2. The datasets are of similar length and the remaining columns are also equal to each other.

CodePudding user response:

Because same number of rows and same indices you can compare classification between both DataFrames for not equal by Series.ne and filter values in boolean indexing:

#ID is index
df1.index[df1['classification'].ne(df2['classification'])]

Or if ID in column:

df1.loc[df1['classification'].ne(df2['classification']), 'ID']

If not same number of rows use Series.map, here ID is column:

s = df2.set_index('ID')['classification']
df1.loc[df1['classification'].ne(df1['ID'].map(s)), 'ID']
  • Related