Home > Mobile >  Can you subtract from multi DF columns based on DF2 single column?
Can you subtract from multi DF columns based on DF2 single column?

Time:03-03

I have DF1 with several int columns and DF2 with 1 int column

DF1:

Year    Industrial  Consumer Discretionary  Technology  Utilities   Energy  Materials   Communications  Consumer Staples    Health Care #No L1  US Agg  Financials  China Agg   EU Agg                                                  
2001    5.884277    6.013842    6.216585    6.640594    6.701400    8.488806    7.175017    6.334284    6.082113    0.000000    5.439149    4.193736    4.686188    4.294788
2002    5.697814    6.277471    5.241045    6.608475    6.983511    8.089475    7.399775    5.882947    5.818563    7.250000    4.877012    3.635425    4.334125    3.944324
2003    5.144356    6.503754    6.270268    5.737079    6.466985    8.122228    7.040089    5.461827    5.385670    5.611753    4.163365    2.888026    3.955665    3.464020
2004    5.436486    6.463149    4.500574    5.329104    5.863406    7.562982    6.521106    5.990889    4.874258    6.554348    4.384878    3.502861    4.556418    3.412025
2005    5.003606    6.108812    5.732764    5.543677    6.131144    7.239053    7.228042    5.421092    5.561518    NaN  4.660754   3.970243    3.944251    3.106951
2006    4.505980    6.017253    4.923927    5.955308    5.799030    7.425253    6.942308

DF2:

Year    Values
2002    4.514752
2003    3.994849
2004    4.254575
2005    4.277520
2006    4.784476
etc..

The indexes are the same for 2 DataFrames.

The goal is to create DF3 while subtracting DF2 from every single column from DF1. (DF2 - DF1 = DF3)

Anywhere where there is a nan, it should skip the math.

CodePudding user response:

Assuming "Year" is the index for both (if not, you can make it the index using set_index), you can use sub on axis:

df3 = df1.sub(df2['Values'], axis=0)

Output:

      Industrial  Consumer  Discretionary  Technology  Utilities    Energy  \
Year                                                                         
2001         NaN       NaN            NaN         NaN        NaN       NaN   
2002    1.183062  1.762719       0.726293    2.093723   2.468759  3.574723   
2003    1.149507  2.508905       2.275419    1.742230   2.472136  4.127379   
2004    1.181911  2.208574       0.245999    1.074529   1.608831  3.308407   
2005    0.726086  1.831292       1.455244    1.266157   1.853624  2.961533   
2006   -0.278496  1.232777       0.139451    1.170832   1.014554  2.640777   

      Materials  Communications  Consumer.1   Staples  Health_Care    US_Agg  \
Year                                                                           
2001        NaN             NaN         NaN       NaN          NaN       NaN   
2002   2.885023        1.368195    1.303811  2.735248     0.362260 -0.879327   
2003   3.045240        1.466978    1.390821  1.616904     0.168516 -1.106823   
2004   2.266531        1.736314    0.619683  2.299773     0.130303 -0.751714   
2005   2.950522        1.143572    1.283998       NaN     0.383234 -0.307277   
2006   2.157832             NaN         NaN       NaN          NaN       NaN   

      Financials  China_Agg  
Year                         
2001         NaN        NaN  
2002   -0.180627  -0.570428  
2003   -0.039184  -0.530829  
2004    0.301843  -0.842550  
2005   -0.333269  -1.170569  
2006         NaN        NaN  

If you want to subtract df1 from df2 instead, you can use rsub instead of sub. It not clear which one you want since you explain that you want df1-df2 but your formula is the opposite.

  • Related