I want to compare two dataframes by index using pd.Dataframe.eq() to get a dataframe with true/false values
But the two dataframes have unique indexes, so I mean that df1 contains a index that is not represented in df2 and vice versa.
I am interested if columns 'a' 'b' 'c' of df1 contain the same value (0 or 1 ) of df2, for each row
df1 = pd.DataFrame({'a':[1,1,1,1,1], 'b':[0,1,0,1,0], 'c':[1,0,0,1,1]}, index=['1_1', '1_2', '2_1', '2_2', '2_3'])
df2 = pd.DataFrame({'a':[0,1,1,1,1,0], 'b':[1,0,1,0,1,0], 'c':[1,1,1,1,1,0]}, index=['1_1', '1_2', '1_3', '2_2', '2_3', '2_4'])
df1.eq(df2)
yields
1_1,False,False,True
1_2,True,False,False
1_3,False,False,False
2_1,False,False,False
2_2,True,False,True
2_3,True,False,True
2_4,False,False,False
and it should look like
1_1,False,False,True
1_2,True,False,False
2_2,True,False,True
2_3,True,False,True
I am not quite sure how to approach the issue of unique indexes. I was thinking about merging the dfs but then I stubble at comparing the columns
Thanks for the help
CodePudding user response:
Seems you only want to keep comparison results for indices that exist in both dataframes, and in this case, you can get the common set of indices by
idx = df1.index.intersection(df2.index)
then
df1.loc[idx].eq(df2.loc[idx])
or
df1.eq(df2).loc[idx]