Home > database >  Calculating the difference column wise of two time series dataframe with different dimensions
Calculating the difference column wise of two time series dataframe with different dimensions

Time:11-11

I have two DataFrames df1 (mxn) and df2 (mx1) as time series and I want to calculate the difference for each column between df1 and df2 which would look like df3.

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[1.2, 0.4, 0.2, -0.4], 
    '02K W':[3.5, 3.2, 'nan', 'nan'], 
    '03K W':[-1, -2.3, 0.3, 2.4], 
    '04K W':[1.5, 2.6, 3.2, 4.2]})

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'K W':[1, 1.5, 1.2, 0.8]})

df3 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[0.2, 1.1, 1, 1.2], 
    '02K W':[2.5, 1.7, 'nan', 'nan'], 
    '03K W':[2, 3.8, 0.9, 1.6], 
    '04K W':[0.5, 1.1, 2, 3.4]})

Is there an easy way to build a difference column wise?

CodePudding user response:

You can set Date as index, and use .sub method:

df1.set_index('Date').sub(df2.set_index('Date')['K W'], axis='rows')

Output:

            01K W  02K W  03K W  04K W
Date                                  
2021-01-01    0.2    2.5   -2.0    0.5
2021-01-02   -1.1    1.7   -3.8    1.1
2021-01-03   -1.0    NaN   -0.9    2.0
2021-01-04   -1.2    NaN    1.6    3.4

Note: you might want to add astype(float) after set_index('Date') to correct your data type.

CodePudding user response:

First you will need to use numeric values, not strings.

It looks like your 'Date' field represents your index. Pandas series can be added/subtracted element-wise based on their shared index so worth calling those out as an index. Then you can simply iterate through your df1 columns to apply df2 to each.

from numpy import nan
import pandas as pd

df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[1.2, 0.4, 0.2, -0.4], 
    '02K W':[3.5, 3.2, nan, nan], 
    '03K W':[-1, -2.3, 0.3, 2.4], 
    '04K W':[1.5, 2.6, 3.2, 4.2]})

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'K W':[1, 1.5, 1.2, 0.8]})

df1 = df1.set_index('Date')
df2 = df2.set_index('Date')

df3 = df1.copy()

for c in df1.columns:
    df3[c] = df1[c] - df2['K W']
    
df3

Yields:

            01K W  02K W  03K W  04K W
Date                                  
2021-01-01    0.2    2.5   -2.0    0.5
2021-01-02   -1.1    1.7   -3.8    1.1
2021-01-03   -1.0    NaN   -0.9    2.0
2021-01-04   -1.2    NaN    1.6    3.4

CodePudding user response:

Another way to do:

df4 = df1[['01K W', '02K W', '03K W', '04K W']].astype(float).subtract(df2['K W'].astype(float), axis=0).abs().join(df1['Date'])[['Date','01K W', '02K W', '03K W', '04K W']]


print(df4)
  • Related