I have 2 dataframes with same column names, I want to get the column name that has different value.
df1(old file)
SYMBOL PPUP R_5UP R1UP R2UP R2_5UP R3UP R3_5UP R4UP
NMRUSDT True True True True True NaN NaN NaN
SPELLUSDT True True NaN NaN NaN NaN NaN NaN
YFIIUSDT True True True NaN NaN NaN NaN NaN
df2 (new file)
SYMBOL PPUP R_5UP R1UP R2UP R2_5UP R3UP R3_5UP R4UP
NMRUSDT True True True True True NaN NaN NaN
SPELLUSDT True NaN NaN NaN NaN NaN NaN NaN
YFIIUSDT True True True True NaN NaN NaN NaN
DF expected result:
SYMBOL oldfilelasttruecolumn newfilelasttruecolumn stepgain
SPELLUSDT R_5UP PPUP -1
YFIIUSDT R1UP R2UP 1
I only managed to remove all the rows that has the exacts same values using
duplicates = pd.concat([df1,df2]).drop_duplicates(keep=False).to_string(index=False, header=False)
but now I'm stucked. This seems a little bit hard to do.
CodePudding user response:
You can use:
idx1 = df1.set_index('SYMBOL').stack().groupby(level='SYMBOL').sum()
idx2 = df2.set_index('SYMBOL').stack().groupby(level='SYMBOL').sum()
out = (pd.DataFrame({'old': df1.columns.to_numpy()[list(idx1)],
'new': df2.columns.to_numpy()[list(idx2)],
'stepgain': idx2-idx1
})
.reset_index().loc[lambda d: d['stepgain'].ne(0)]
)
output:
SYMBOL old new stepgain
1 SPELLUSDT R_5UP PPUP -1
2 YFIIUSDT R1UP R2UP 1