I have DF1 with several int columns and DF2 with 1 int column
DF1
:
Year Industrial Consumer Discretionary Technology Utilities Energy Materials Communications Consumer Staples Health Care #No L1 US Agg Financials China Agg EU Agg
2001 5.884277 6.013842 6.216585 6.640594 6.701400 8.488806 7.175017 6.334284 6.082113 0.000000 5.439149 4.193736 4.686188 4.294788
2002 5.697814 6.277471 5.241045 6.608475 6.983511 8.089475 7.399775 5.882947 5.818563 7.250000 4.877012 3.635425 4.334125 3.944324
2003 5.144356 6.503754 6.270268 5.737079 6.466985 8.122228 7.040089 5.461827 5.385670 5.611753 4.163365 2.888026 3.955665 3.464020
2004 5.436486 6.463149 4.500574 5.329104 5.863406 7.562982 6.521106 5.990889 4.874258 6.554348 4.384878 3.502861 4.556418 3.412025
2005 5.003606 6.108812 5.732764 5.543677 6.131144 7.239053 7.228042 5.421092 5.561518 NaN 4.660754 3.970243 3.944251 3.106951
2006 4.505980 6.017253 4.923927 5.955308 5.799030 7.425253 6.942308
DF2
:
Year Values
2002 4.514752
2003 3.994849
2004 4.254575
2005 4.277520
2006 4.784476
etc..
The indexes are the same for 2 DataFrames.
The goal is to create DF3 while subtracting DF2 from every single column from DF1. (DF2 - DF1 = DF3
)
Anywhere where there is a nan, it should skip the math.
CodePudding user response:
Assuming "Year" is the index for both (if not, you can make it the index using set_index
), you can use sub
on axis:
df3 = df1.sub(df2['Values'], axis=0)
Output:
Industrial Consumer Discretionary Technology Utilities Energy \
Year
2001 NaN NaN NaN NaN NaN NaN
2002 1.183062 1.762719 0.726293 2.093723 2.468759 3.574723
2003 1.149507 2.508905 2.275419 1.742230 2.472136 4.127379
2004 1.181911 2.208574 0.245999 1.074529 1.608831 3.308407
2005 0.726086 1.831292 1.455244 1.266157 1.853624 2.961533
2006 -0.278496 1.232777 0.139451 1.170832 1.014554 2.640777
Materials Communications Consumer.1 Staples Health_Care US_Agg \
Year
2001 NaN NaN NaN NaN NaN NaN
2002 2.885023 1.368195 1.303811 2.735248 0.362260 -0.879327
2003 3.045240 1.466978 1.390821 1.616904 0.168516 -1.106823
2004 2.266531 1.736314 0.619683 2.299773 0.130303 -0.751714
2005 2.950522 1.143572 1.283998 NaN 0.383234 -0.307277
2006 2.157832 NaN NaN NaN NaN NaN
Financials China_Agg
Year
2001 NaN NaN
2002 -0.180627 -0.570428
2003 -0.039184 -0.530829
2004 0.301843 -0.842550
2005 -0.333269 -1.170569
2006 NaN NaN
If you want to subtract df1
from df2
instead, you can use rsub
instead of sub
. It not clear which one you want since you explain that you want df1-df2
but your formula is the opposite.