Home > Blockchain >  How to convert to timeseries based on column values as duration
How to convert to timeseries based on column values as duration

Time:11-14

I have data with timestamps and duration of each operation. I want to convert the data into 1 minute time series and fill the rows based on the duration column and leave other rows NaN when it is not continuous. Data:

datetime           action  duration
2022-01-01 00:00      3      40
2022-01-01 00:40      1      10
2022-01-01 02:34      5      50

Desired outcome:

datetime           action  duration
2022-01-01 00:00      3      40
2022-01-01 00:01      3      40
...
2022-01-01 00:39      3      40
2022-01-01 00:40      1      10
...
2022-01-01 00:49      1      10
2022-01-01 00:50      NaN    NaN
2022-01-01 00:51      NaN    NaN
...
2022-01-01 02:34      5      50
2022-01-01 02:35      5      50

I've tried: df.resample("1min").fillna("pad") but it fills the in-between times with the latest input. Action entries should be filled based on the duration, then leave NaN.

How can I achieve this?

CodePudding user response:

try this:

tmp = df.copy()
tmp['datetime'] = tmp.apply(lambda x: pd.date_range(
    x[0], periods=x[-1], freq='1min'), axis=1)
tmp = tmp.explode('datetime').set_index('datetime')
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.set_index('datetime')
df[:] = float('nan')
res = df.resample(rule='1min').ffill().combine_first(tmp)
print(res)

CodePudding user response:

Try updating just the pandas dataframe index frequency by

df = df.asfreq('60S')

This should update the datetime index and bring NaNs automatically where no values are present. No fillna required.

  • Related