I have data with timestamps and duration of each operation. I want to convert the data into 1 minute time series and fill the rows based on the duration column and leave other rows NaN when it is not continuous. Data:
datetime action duration
2022-01-01 00:00 3 40
2022-01-01 00:40 1 10
2022-01-01 02:34 5 50
Desired outcome:
datetime action duration
2022-01-01 00:00 3 40
2022-01-01 00:01 3 40
...
2022-01-01 00:39 3 40
2022-01-01 00:40 1 10
...
2022-01-01 00:49 1 10
2022-01-01 00:50 NaN NaN
2022-01-01 00:51 NaN NaN
...
2022-01-01 02:34 5 50
2022-01-01 02:35 5 50
I've tried: df.resample("1min").fillna("pad")
but it fills the in-between times with the latest input. Action entries should be filled based on the duration, then leave NaN.
How can I achieve this?
CodePudding user response:
try this:
tmp = df.copy()
tmp['datetime'] = tmp.apply(lambda x: pd.date_range(
x[0], periods=x[-1], freq='1min'), axis=1)
tmp = tmp.explode('datetime').set_index('datetime')
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.set_index('datetime')
df[:] = float('nan')
res = df.resample(rule='1min').ffill().combine_first(tmp)
print(res)
CodePudding user response:
Try updating just the pandas dataframe index frequency by
df = df.asfreq('60S')
This should update the datetime index and bring NaNs automatically where no values are present. No fillna required.