Interpolation using `asfreq('D')` in Multiindex-CodePudding

The following code generates two DataFrames:

frame1=pd.DataFrame({'dates':['2023-01-01','2023-01-07','2023-01-09'],'values':[0,18,28]})
frame1['dates']=pd.to_datetime(frame1['dates'])
frame1=frame1.set_index('dates')

frame2=pd.DataFrame({'dates':['2023-01-08','2023-01-12'],'values':[8,12]})
frame2['dates']=pd.to_datetime(frame2['dates'])
frame2=frame2.set_index('dates')

Using

frame1.asfreq('D').interpolate()
frame2.asfreq('D').interpolate()

we can interpolate their values between the days to obtain

and

However, consider now the concatenation table:

frame1['frame']='f1'
frame2['frame']='f2'
concat=pd.concat([frame1,frame2])
concat=concat.set_index('frame',append=True)
concat=concat.reorder_levels(['frame','dates'])
concat

I want to do the interpolation using one command like

concat.groupby('frame').apply(lambda g:g.asfreq('D').interpolate())

direktly in the concatenation table. Unfortunately, my above command does not work but raises a TypeError:

TypeError: Cannot convert input [('f1', Timestamp('2023-01-01 00:00:00'))] of type <class 'tuple'> to Timestamp

How do I fix that command to work?

CodePudding user response：

You have to drop the first level index (the group key) before use asfreq like your initial dataframes:

>>> concat.groupby('frame').apply(lambda g: g.loc[g.name].asfreq('D').interpolate())

                  values
frame dates             
f1    2023-01-01     0.0
      2023-01-02     3.0
      2023-01-03     6.0
      2023-01-04     9.0
      2023-01-05    12.0
      2023-01-06    15.0
      2023-01-07    18.0
      2023-01-08    23.0
      2023-01-09    28.0
f2    2023-01-08     8.0
      2023-01-09     9.0
      2023-01-10    10.0
      2023-01-11    11.0
      2023-01-12    12.0

To debug use a named function instead of a lambda function:

def interpolate(g):
    print(f'[Group {g.name}]')
    print(g.loc[g.name])
    print()
    return g.loc[g.name].asfreq('D').interpolate()

out = concat.groupby('frame').apply(interpolate)

Output:

[Group f1]
            values
dates             
2023-01-01       0
2023-01-07      18
2023-01-09      28

[Group f2]
            values
dates             
2023-01-08       8
2023-01-12      12