Home > Back-end >  Get the column name that has different value Python pandas
Get the column name that has different value Python pandas

Time:07-01

I have 2 dataframes with same column names, I want to get the column name that has different value.

df1(old file)

   SYMBOL  PPUP  R_5UP  R1UP  R2UP R2_5UP  R3UP  R3_5UP  R4UP
  NMRUSDT  True   True  True  True   True   NaN     NaN   NaN
SPELLUSDT  True   True   NaN   NaN    NaN   NaN     NaN   NaN
 YFIIUSDT  True   True  True   NaN    NaN   NaN     NaN   NaN

df2 (new file)

   SYMBOL  PPUP R_5UP  R1UP  R2UP R2_5UP  R3UP  R3_5UP  R4UP
  NMRUSDT  True  True  True  True   True   NaN     NaN   NaN
SPELLUSDT  True   NaN   NaN   NaN    NaN   NaN     NaN   NaN
 YFIIUSDT  True  True  True  True    NaN   NaN     NaN   NaN

DF expected result:

   SYMBOL oldfilelasttruecolumn newfilelasttruecolumn  stepgain
SPELLUSDT                 R_5UP                  PPUP        -1
 YFIIUSDT                  R1UP                  R2UP         1

I only managed to remove all the rows that has the exacts same values using

duplicates = pd.concat([df1,df2]).drop_duplicates(keep=False).to_string(index=False, header=False)

but now I'm stucked. This seems a little bit hard to do.

CodePudding user response:

You can use:

idx1 = df1.set_index('SYMBOL').stack().groupby(level='SYMBOL').sum()
idx2 = df2.set_index('SYMBOL').stack().groupby(level='SYMBOL').sum()

out = (pd.DataFrame({'old': df1.columns.to_numpy()[list(idx1)],
                     'new': df2.columns.to_numpy()[list(idx2)],
                     'stepgain': idx2-idx1
                    })
         .reset_index().loc[lambda d: d['stepgain'].ne(0)]
      )

output:

      SYMBOL    old   new stepgain
1  SPELLUSDT  R_5UP  PPUP       -1
2   YFIIUSDT   R1UP  R2UP        1
  • Related