I have two DataFrames df1
(mxn) and df2
(mx1) as time series and I want to calculate the difference for each column between df1
and df2
which would look like df3
.
import pandas as pd
df1 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K W':[1.2, 0.4, 0.2, -0.4],
'02K W':[3.5, 3.2, 'nan', 'nan'],
'03K W':[-1, -2.3, 0.3, 2.4],
'04K W':[1.5, 2.6, 3.2, 4.2]})
df2 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'K W':[1, 1.5, 1.2, 0.8]})
df3 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K W':[0.2, 1.1, 1, 1.2],
'02K W':[2.5, 1.7, 'nan', 'nan'],
'03K W':[2, 3.8, 0.9, 1.6],
'04K W':[0.5, 1.1, 2, 3.4]})
Is there an easy way to build a difference column wise?
CodePudding user response:
You can set Date
as index, and use .sub
method:
df1.set_index('Date').sub(df2.set_index('Date')['K W'], axis='rows')
Output:
01K W 02K W 03K W 04K W
Date
2021-01-01 0.2 2.5 -2.0 0.5
2021-01-02 -1.1 1.7 -3.8 1.1
2021-01-03 -1.0 NaN -0.9 2.0
2021-01-04 -1.2 NaN 1.6 3.4
Note: you might want to add astype(float)
after set_index('Date')
to correct your data type.
CodePudding user response:
First you will need to use numeric values, not strings.
It looks like your 'Date' field represents your index. Pandas series can be added/subtracted element-wise based on their shared index so worth calling those out as an index. Then you can simply iterate through your df1 columns to apply df2 to each.
from numpy import nan
import pandas as pd
df1 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K W':[1.2, 0.4, 0.2, -0.4],
'02K W':[3.5, 3.2, nan, nan],
'03K W':[-1, -2.3, 0.3, 2.4],
'04K W':[1.5, 2.6, 3.2, 4.2]})
df2 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'K W':[1, 1.5, 1.2, 0.8]})
df1 = df1.set_index('Date')
df2 = df2.set_index('Date')
df3 = df1.copy()
for c in df1.columns:
df3[c] = df1[c] - df2['K W']
df3
Yields:
01K W 02K W 03K W 04K W
Date
2021-01-01 0.2 2.5 -2.0 0.5
2021-01-02 -1.1 1.7 -3.8 1.1
2021-01-03 -1.0 NaN -0.9 2.0
2021-01-04 -1.2 NaN 1.6 3.4
CodePudding user response:
Another way to do:
df4 = df1[['01K W', '02K W', '03K W', '04K W']].astype(float).subtract(df2['K W'].astype(float), axis=0).abs().join(df1['Date'])[['Date','01K W', '02K W', '03K W', '04K W']]
print(df4)