Home > Net >  How to convert data into timeseries for column groups
How to convert data into timeseries for column groups

Time:11-10

I have data with timestamps. Users do tasks, and the timestamp is recorded. Each user is identified by a 'uid'. I want to convert this data into 10-minute granular time series, but for each user separately. So, timestamp goes in chronological order for uid=1 separately, then for uid=2 and so on.

From:

timestamp          uid    var
2020-01-01 10:00    1     10
2020-01-01 10:04    2     20 
2020-01-01 20:02    2     15
2020-01-01 21:20    1     10
.. 
2020-01-15 23:12    1     5

To:

timestamp           uid   var
2020-01-01 10:00     1    10
2020-01-01 10:10     1    NaN
2020-01-01 10:20     1    NaN
...
2020-01-15 23:10     1    5
2020-01-01 10:00     2    20
2020-01-01 10:10     2    NaN
2020-01-01 10:20     2    NaN
...

Edit for next step:

I want to fill missing values for 1 hour after each 'var' entry So:

timestamp           uid   var
2020-01-01 10:00     1    10
2020-01-01 10:10     1    10
2020-01-01 10:20     1    10
2020-01-01 10:30     1    10
2020-01-01 10:40     1    10
2020-01-01 10:50     1    10
2020-01-01 11:00     1    NaN
2020-01-01 11:10     1    NaN
2020-01-01 11:20     1    NaN

I tried:

pd.concat([pd.DataFrame({'timestamp': [df['timestamp'].min()-pd.Timedelta('1h')]}), df])
 .set_index('timestamp').bfill(1*6).reset_index()

but it gives

ValueError: cannot reindex a non-unique index with a method or limit

because we created duplicates in timestamp.

CodePudding user response:

grouped by uid column and resample 10T

import numpy as np

(df.groupby('uid')
 .resample(rule='10T')['var'].sum()
 .reset_index(level=0)
 .replace({0: np.NaN}))
  • Related