Home > Software design >  How to set a multiindex with multiple dates in pandas?
How to set a multiindex with multiple dates in pandas?

Time:10-08

I have the following dataframe df:

Datetime1              Datetime2             Value
2018-01-01 00:00       2018-01-01 01:00      5
2018-01-01 01:00       2018-01-01 02:00      1
2018-01-01 02:00       2018-01-01 03:00      2
2018-01-01 03:00       2018-01-01 04:00      3
2018-01-01 04:00       2018-01-01 05:00      6

I want to set a multi index composed of Datetime1 and Datetime2 to further proceed with the data resampling and interpolation (from 1 hour to 30 minutes frequency).

If I do df.set_index(["Datetime1","Datetime2"]).resample("30T").ffill(), then it fails.

Desired output:

Datetime1              Datetime2             Value
2018-01-01 00:00       2018-01-01 01:00      5
2018-01-01 00:30       2018-01-01 01:30      5
2018-01-01 01:00       2018-01-01 02:00      1
2018-01-01 01:30       2018-01-01 02:30      1
...

CodePudding user response:

If there is one hour difference is possible create MultiIndex after resample with add 1H to new DatetimeIndex:

df = df.set_index(["Datetime1"])[['Value']].resample("30T").ffill()
df = df.set_index([df.index.rename('Datetime2')   pd.Timedelta('1H')], append=True)

print (df)
                                         Value
Datetime1           Datetime2                 
2018-01-01 00:00:00 2018-01-01 01:00:00      5
2018-01-01 00:30:00 2018-01-01 01:30:00      5
2018-01-01 01:00:00 2018-01-01 02:00:00      1
2018-01-01 01:30:00 2018-01-01 02:30:00      1
2018-01-01 02:00:00 2018-01-01 03:00:00      2
2018-01-01 02:30:00 2018-01-01 03:30:00      2
2018-01-01 03:00:00 2018-01-01 04:00:00      3
2018-01-01 03:30:00 2018-01-01 04:30:00      3
2018-01-01 04:00:00 2018-01-01 05:00:00      6

Or:

s = df.set_index(["Datetime1"])['Value'].resample("30T").ffill()
s.index = [s.index,s.index.rename('Datetime2')   pd.Timedelta('1H')]

print (s)
Datetime1            Datetime2          
2018-01-01 00:00:00  2018-01-01 01:00:00    5
2018-01-01 00:30:00  2018-01-01 01:30:00    5
2018-01-01 01:00:00  2018-01-01 02:00:00    1
2018-01-01 01:30:00  2018-01-01 02:30:00    1
2018-01-01 02:00:00  2018-01-01 03:00:00    2
2018-01-01 02:30:00  2018-01-01 03:30:00    2
2018-01-01 03:00:00  2018-01-01 04:00:00    3
2018-01-01 03:30:00  2018-01-01 04:30:00    3
2018-01-01 04:00:00  2018-01-01 05:00:00    6
Name: Value, dtype: int64

CodePudding user response:

The multi-index is not meant for a double-index but for a hierarchical (grouped) index. See the docs. You said in the comments, that Datetime2 is always offset by 1 hour. That means it's probably fastest to recalculate it:

df.set_index("Datetime1","Datetime2").resample("30T").ffill()
df["Datetime2" = df.index   pd.Timedelta(1, "hour")
  • Related