The following code generates two DataFrames:
frame1=pd.DataFrame({'dates':['2023-01-01','2023-01-07','2023-01-09'],'values':[0,18,28]})
frame1['dates']=pd.to_datetime(frame1['dates'])
frame1=frame1.set_index('dates')
frame2=pd.DataFrame({'dates':['2023-01-08','2023-01-12'],'values':[8,12]})
frame2['dates']=pd.to_datetime(frame2['dates'])
frame2=frame2.set_index('dates')
Using
frame1.asfreq('D').interpolate()
frame2.asfreq('D').interpolate()
we can interpolate their values between the days to obtain
However, consider now the concatenation table:
frame1['frame']='f1'
frame2['frame']='f2'
concat=pd.concat([frame1,frame2])
concat=concat.set_index('frame',append=True)
concat=concat.reorder_levels(['frame','dates'])
concat
I want to do the interpolation using one command like
concat.groupby('frame').apply(lambda g:g.asfreq('D').interpolate())
direktly in the concatenation table. Unfortunately, my above command does not work but raises a TypeError
:
TypeError: Cannot convert input [('f1', Timestamp('2023-01-01 00:00:00'))] of type <class 'tuple'> to Timestamp
How do I fix that command to work?
CodePudding user response:
You have to drop the first level index (the group key) before use asfreq
like your initial dataframes:
>>> concat.groupby('frame').apply(lambda g: g.loc[g.name].asfreq('D').interpolate())
values
frame dates
f1 2023-01-01 0.0
2023-01-02 3.0
2023-01-03 6.0
2023-01-04 9.0
2023-01-05 12.0
2023-01-06 15.0
2023-01-07 18.0
2023-01-08 23.0
2023-01-09 28.0
f2 2023-01-08 8.0
2023-01-09 9.0
2023-01-10 10.0
2023-01-11 11.0
2023-01-12 12.0
To debug use a named function instead of a lambda function:
def interpolate(g):
print(f'[Group {g.name}]')
print(g.loc[g.name])
print()
return g.loc[g.name].asfreq('D').interpolate()
out = concat.groupby('frame').apply(interpolate)
Output:
[Group f1]
values
dates
2023-01-01 0
2023-01-07 18
2023-01-09 28
[Group f2]
values
dates
2023-01-08 8
2023-01-12 12