Home > Net >  Difference of specific rows between two DataFrames with different index types
Difference of specific rows between two DataFrames with different index types

Time:05-24

I have two DataFrames; df1 and df2 with the same columns, but different index types. I try to create df3, which calculates the pairwise difference of two specific rows (for example df1 row Type1 and df2 row 2022-01-09). If there is a NaN value, I also want to get a NaN value in the difference DataFrame df3.

df1: 
        ID1 ID2 ID3 ID4
Type                
Type1   1   0   1   NaN
Type2   1   0   0   0.0

df2:
            ID1 ID2 ID3 ID4
2022-01-02  1   0   1   0
2022-01-05  1   0   0   1
2022-01-09  1   1   0   1
2022-01-10  0   0   1   1

df3:
        ID1 ID2 ID3 ID4
Type                
Type1   0   1   1   NaN

For reproducability:

import datetime
import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'Type':['Type1', 'Type2'],
    'ID1':[1, 1], 
    'ID2':[0, 0], 
    'ID3':[1, 0],
    'ID4':[np.nan, 0]})
df1 = df1.set_index('Type')

df2 = pd.DataFrame({
    'Date':['2022-01-02', '2022-01-05', '2022-01-09', '2022-01-10'],
    'ID1':[1, 1, 1, 0], 
    'ID2':[0, 0, 1, 0], 
    'ID3':[1, 0, 0, 1],
    'ID4':[0, 1, 1, 1]})
df2 = df2.set_index('Date')
df2.index = pd.to_datetime(df2.index).date

So far, I only tried it with .diff().

Thanks a lot for suggestions!

CodePudding user response:

Use DataFrame.loc for select rows, solution with DatetimeIndex is simplier - select by '2022-01-09', subtract one row DataFrame with Series and convert output to absolute values:

df2.index = pd.to_datetime(df2.index)


df = df1.loc[['Type1']].sub(df2.loc['2022-01-09']).abs()
print(df)
       ID1  ID2  ID3  ID4
Type                     
Type1    0    1    1  NaN

df2.index = pd.to_datetime(df2.index).date

df = df1.loc[['Type1']].sub(df2.loc[pd.Timestamp('2022-01-09').date()]).abs()
print(df)
       ID1  ID2  ID3  ID4
Type                     
Type1    0    1    1  NaN
    
  • Related