I have data with timestamps. Users do tasks, and the timestamp is recorded. Each user is identified by a 'uid'. I want to convert this data into 10-minute granular time series, but for each user separately. So, timestamp goes in chronological order for uid=1 separately, then for uid=2 and so on.
From:
timestamp uid var
2020-01-01 10:00 1 10
2020-01-01 10:04 2 20
2020-01-01 20:02 2 15
2020-01-01 21:20 1 10
..
2020-01-15 23:12 1 5
To:
timestamp uid var
2020-01-01 10:00 1 10
2020-01-01 10:10 1 NaN
2020-01-01 10:20 1 NaN
...
2020-01-15 23:10 1 5
2020-01-01 10:00 2 20
2020-01-01 10:10 2 NaN
2020-01-01 10:20 2 NaN
...
Edit for next step:
I want to fill missing values for 1 hour after each 'var' entry So:
timestamp uid var
2020-01-01 10:00 1 10
2020-01-01 10:10 1 10
2020-01-01 10:20 1 10
2020-01-01 10:30 1 10
2020-01-01 10:40 1 10
2020-01-01 10:50 1 10
2020-01-01 11:00 1 NaN
2020-01-01 11:10 1 NaN
2020-01-01 11:20 1 NaN
I tried:
pd.concat([pd.DataFrame({'timestamp': [df['timestamp'].min()-pd.Timedelta('1h')]}), df])
.set_index('timestamp').bfill(1*6).reset_index()
but it gives
ValueError: cannot reindex a non-unique index with a method or limit
because we created duplicates in timestamp.
CodePudding user response:
grouped by uid
column and resample
10T
import numpy as np
(df.groupby('uid')
.resample(rule='10T')['var'].sum()
.reset_index(level=0)
.replace({0: np.NaN}))