Home > Back-end >  How to compare two dataframes and find the combination for which there is value in first dataframe b
How to compare two dataframes and find the combination for which there is value in first dataframe b

Time:02-10

I have two pandas dataframes with same columns say name, jan, feb, march, april. I want to compare the two dataframes and find out the name, month combination for which I have value in my first dataframe but not in my second dataframe.

df1:

Name jan feb March
ABC 125 225 NaN
DEF NaN 30 214

df2:

Name jan feb March
ABC 125 NaN NaN
XYZ 254 130 NaN

Expected output:

Name Month
ABC feb
Def feb
Def March

I tried to merge the two dataframes, but it is not giving me the expected result. I'm not sure how to proceed with this.

CodePudding user response:

Here is a possible approach:

# only if not already index
# df1 = df1.set_index('Name')
# df2 = df2.set_index('Name')

s = ((df1.notna()&df1.ne(df2.reindex_like(df1)))
     .rename_axis('Month', axis=1).stack())
s[s].reset_index().drop(0, axis=1)

output:

   Name  Month
0  ABC     feb
1  DEF     feb
2  DEF   March

CodePudding user response:

set_index to "Name" stack reset_index the DataFrames and outer merge on name and month. Then filter the merged DataFrame by the condition:

out = df1.set_index('Name').stack().reset_index().merge(df2.set_index('Name').stack().reset_index(), on=['Name','level_1'], how='outer')
out = out.loc[out['0_x'].notna() & out['0_y'].isna(), ['Name','level_1']].rename(columns={'level_1':'Month'})

Output:

  Name  Month
1  ABC    feb
2  DEF    feb
3  DEF  March
  • Related