I have two pandas dataframes with same columns say name, jan, feb, march, april. I want to compare the two dataframes and find out the name, month combination for which I have value in my first dataframe but not in my second dataframe.
df1:
Name | jan | feb | March |
---|---|---|---|
ABC | 125 | 225 | NaN |
DEF | NaN | 30 | 214 |
df2:
Name | jan | feb | March |
---|---|---|---|
ABC | 125 | NaN | NaN |
XYZ | 254 | 130 | NaN |
Expected output:
Name | Month |
---|---|
ABC | feb |
Def | feb |
Def | March |
I tried to merge the two dataframes, but it is not giving me the expected result. I'm not sure how to proceed with this.
CodePudding user response:
Here is a possible approach:
# only if not already index
# df1 = df1.set_index('Name')
# df2 = df2.set_index('Name')
s = ((df1.notna()&df1.ne(df2.reindex_like(df1)))
.rename_axis('Month', axis=1).stack())
s[s].reset_index().drop(0, axis=1)
output:
Name Month
0 ABC feb
1 DEF feb
2 DEF March
CodePudding user response:
set_index
to "Name" stack
reset_index
the DataFrames and outer merge
on name and month. Then filter the merged DataFrame by the condition:
out = df1.set_index('Name').stack().reset_index().merge(df2.set_index('Name').stack().reset_index(), on=['Name','level_1'], how='outer')
out = out.loc[out['0_x'].notna() & out['0_y'].isna(), ['Name','level_1']].rename(columns={'level_1':'Month'})
Output:
Name Month
1 ABC feb
2 DEF feb
3 DEF March