Source df as like:
EventType User Item
View A 1
View B 1
Like C 2
View C 2
Buy A 1
We have 5 users: A B C D E
We have 6 Items: 1 2 3 4 5 6
I would like to generate new df like
Event_Type Event_Ratio ItemsHaveEvent UsersHaveEvent
View 0.6 0.33 0.6
Like 0.2 0.167 0.2
Buy 0.2 0.167 0.2
Event_Type: same as EventType in original df
Event_Ratio: the event / total events
ItemsHaveEvent: items have this event / total items
UsersHaveEvent: users have this event / total users
How to write idiomatic pandas code in declarative way to do this?
CodePudding user response:
One option is with named aggregation
:
total_items = 6
total_users = 5
total_events = len(df)
(df
.groupby('EventType', sort = False, as_index = False)
.agg(
EventRatio = ('EventType', lambda f: f.size/total_events),
ItemsHaveEvent = ('Item', lambda f: f.nunique()/total_items),
UsersHaveEvent = ('User', lambda f: f.nunique()/total_users))
)
EventType EventRatio ItemsHaveEvent UsersHaveEvent
0 View 0.6 0.333333 0.6
1 Like 0.2 0.166667 0.2
2 Buy 0.2 0.166667 0.2