I have a table with the following structure; the count
column gets updated every time a user accesses the app again on that date.
user_id | date | count |
---|---|---|
1 | 1/1/2021 | 4 |
2 | 1/1/2021 | 7 |
1 | 1/2/2021 | 3 |
3 | 1/2/2021 | 10 |
2 | 1/3/2021 | 4 |
4 | 1/1/2021 | 12 |
I want to de-aggregate this data based on the count
, so for example, user_id
of 1 will have four records on 1/1/2021 without the count
column. After that, I want to concatenate a random time to the date. My output would like this:
user_id | date_time |
---|---|
1 | 1/1/2021 16:00:21 |
1 | 1/1/2021 7:23:55 |
1 | 1/1/2021 12:01:45 |
1 | 1/1/2021 21:21:07 |
I'm using pandas for this. Randomizing the timestamps is straightforward I think, just de-aggregating the data based on a column is a little tricky for me.
CodePudding user response:
You can duplicate the index and add a random time between 0 and 24 hours:
(df.loc[df.index.repeat(df['count'])]
.assign(date=lambda d: pd.to_datetime(d['date'])
pd.to_timedelta(np.random.randint(0,24*3600, size=len(d)), unit='s'))
.rename({'date': 'date_time'})
.drop('count', axis=1)
)
output:
user_id date
0 1 2021-01-01 03:32:40
0 1 2021-01-01 03:54:18
0 1 2021-01-01 00:57:49
0 1 2021-01-01 13:04:08
1 2 2021-01-01 00:34:03
1 2 2021-01-01 00:14:17
1 2 2021-01-01 03:57:20
1 2 2021-01-01 22:01:11
1 2 2021-01-01 22:09:55
1 2 2021-01-01 13:15:36
1 2 2021-01-01 12:26:39
2 1 2021-01-02 22:51:17
2 1 2021-01-02 13:44:12
2 1 2021-01-02 01:39:14
3 3 2021-01-02 09:22:16
3 3 2021-01-02 03:34:15
3 3 2021-01-02 23:05:49
3 3 2021-01-02 02:21:35
3 3 2021-01-02 19:51:41
3 3 2021-01-02 16:02:20
3 3 2021-01-02 18:14:05
3 3 2021-01-02 09:07:14
3 3 2021-01-02 22:43:44
3 3 2021-01-02 20:48:15
4 2 2021-01-03 19:25:04
4 2 2021-01-03 14:08:03
4 2 2021-01-03 21:23:58
4 2 2021-01-03 17:24:58
5 4 2021-01-01 23:37:41
5 4 2021-01-01 06:06:17
5 4 2021-01-01 19:23:29
5 4 2021-01-01 02:12:50
5 4 2021-01-01 08:09:59
5 4 2021-01-01 03:49:30
5 4 2021-01-01 08:00:42
5 4 2021-01-01 08:03:34
5 4 2021-01-01 15:36:12
5 4 2021-01-01 14:50:43
5 4 2021-01-01 14:54:04
5 4 2021-01-01 14:58:08