Home > OS >  pandas concat compare 2 dataframes with same column names how to build a differ column?
pandas concat compare 2 dataframes with same column names how to build a differ column?

Time:03-31

I'm using pandas concat to compare 2 dataframes have same columns and rows:

import pandas as pd

df=pd.read_csv(r'C:\Users\compare\T1.csv')
df2=pd.read_csv(r'C:\Users\compare\T2.csv')

index=['contract','RB','payee','fund']

df_all = pd.concat([df.set_index(index), df2.set_index(index)], 
                   axis='columns', keys=['First', 'Second'])

df_final = df_all.swaplevel(axis='columns')[df.columns[54:56]]

df_final

The output is:

                              SD1                SD2
                             First     Second    First   Second
contract    RB  payee   fund                
AG72916Z    2   1   W42      15622.9    15622.9  15622.9    15489.2
            4   1   W44      14697.8    14697.8  14697.8    14572.1
            8   1   W48      7388.56    7388.56  7388.56    7325.37
AL0024AZ    C3  1   202      226.585    226.59   220.366    220.37
            S3  1   204      804.059    804.06   781.99     781.99

My question is how can I add a differ column after each Second ,so that I can easily tell the comparison result,the output should looks like this:

                              SD1                         SD2
                             First     Second   differ    First     Second   differ
contract    RB  payee   fund                
AG72916Z    2   1   W42      15622.9    15622.9    0       15622.9  15489.2  133.7
            4   1   W44      14697.8    14697.8    0       14697.8  14572.1  125.7
            8   1   W48      7388.56    7388.56    0       7388.56  7325.37  63.19
AL0024AZ    C3  1   202      226.585    226.59     0.05    220.366  220.37  -0.004
            S3  1   204      804.059    804.06     0.01    781.99   781.99   0

CodePudding user response:

A bit tricky but necessary to keep ordering:

out = df_final.stack(level=0).assign(Diff=lambda x: x['First'] - x['Second']) \
              .stack().unstack(level=[-2, -1])
print(out)

# Output
                              SD1                         SD2                   
                            First    Second   Diff      First    Second     Diff
contract RB payee fund                                                          
AG72916Z 2  1     W42   15622.900  15622.90  0.000  15622.900  15489.20  133.700
         4  1     W44   14697.800  14697.80  0.000  14697.800  14572.10  125.700
         8  1     W48    7388.560   7388.56  0.000   7388.560   7325.37   63.190
AL0024AZ C3 1     202     226.585    226.59 -0.005    220.366    220.37   -0.004
         S3 1     204     804.059    804.06 -0.001    781.990    781.99    0.000
  • Related