I have two DataFrames
; df1
and df2
with the same columns, but different index types. I try to create df3
, which calculates the pairwise difference of two specific rows (for example df1
row Type1
and df2
row 2022-01-09
). If there is a NaN
value, I also want to get a NaN
value in the difference DataFrame
df3
.
df1:
ID1 ID2 ID3 ID4
Type
Type1 1 0 1 NaN
Type2 1 0 0 0.0
df2:
ID1 ID2 ID3 ID4
2022-01-02 1 0 1 0
2022-01-05 1 0 0 1
2022-01-09 1 1 0 1
2022-01-10 0 0 1 1
df3:
ID1 ID2 ID3 ID4
Type
Type1 0 1 1 NaN
For reproducability:
import datetime
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'Type':['Type1', 'Type2'],
'ID1':[1, 1],
'ID2':[0, 0],
'ID3':[1, 0],
'ID4':[np.nan, 0]})
df1 = df1.set_index('Type')
df2 = pd.DataFrame({
'Date':['2022-01-02', '2022-01-05', '2022-01-09', '2022-01-10'],
'ID1':[1, 1, 1, 0],
'ID2':[0, 0, 1, 0],
'ID3':[1, 0, 0, 1],
'ID4':[0, 1, 1, 1]})
df2 = df2.set_index('Date')
df2.index = pd.to_datetime(df2.index).date
So far, I only tried it with .diff()
.
Thanks a lot for suggestions!
CodePudding user response:
Use DataFrame.loc
for select rows, solution with DatetimeIndex is simplier - select by '2022-01-09'
, subtract one row DataFrame with Series
and convert output to absolute values:
df2.index = pd.to_datetime(df2.index)
df = df1.loc[['Type1']].sub(df2.loc['2022-01-09']).abs()
print(df)
ID1 ID2 ID3 ID4
Type
Type1 0 1 1 NaN
df2.index = pd.to_datetime(df2.index).date
df = df1.loc[['Type1']].sub(df2.loc[pd.Timestamp('2022-01-09').date()]).abs()
print(df)
ID1 ID2 ID3 ID4
Type
Type1 0 1 1 NaN