my dataframe looks something as follows
user action created_at
user1 A 2021-12-15 11:58:53.217258
user1 A 2021-12-15 11:50:27.864876
user1 B 2021-12-15 14:23:27.864876
user1 A 2021-12-15 14:23:27.864876
user3 C 2021-12-15 12:48:09.044315
user3 D 2021-12-15 12:48:09.044315
user4 A 2021-12-15 14:48:09.044315
i want something as following. where total action is the frequency of of a action
11-12 12-13 ... 14-15
user total_action action
user1 3 A 2 0 1
user1 1 B 0 0 1
user3 1 C 0 1 0
user4 1 D 0 1 0
user5 1 A 0 0 1
here 11-12, 12-13 are the time intervals(could be in string format, no issues there). I have tried looking at pd.Grouper but that also isn't helping much.
CodePudding user response:
This is pd.crosstab
:
(pd.crosstab([df.user, df.action],df.created_at.dt.hour,
margins=True, margins_name='total_action')
.iloc[:-1] # remove column total
)
Output:
created_at 11 12 14 total_action
user action
user1 A 2 0 1 3
B 0 0 1 1
user3 C 0 1 0 1
D 0 1 0 1
user4 A 0 0 1 1
Or you can use groupby
with grouper like:
(df.groupby(['user','action', pd.Grouper(key='created_at', freq='H')])
.size().unstack(fill_value=0)
.reset_index()
)
Output:
created_at user action 2021-12-15 11:00:00 2021-12-15 14:00:00 2021-12-15 12:00:00
0 user1 A 2 1 0
1 user1 B 0 1 0
2 user3 C 0 0 1
3 user3 D 0 0 1
4 user4 A 0 1 0
and you can add total_action
column.