Home > database >  pandas groupby and find most frequent value (mode)
pandas groupby and find most frequent value (mode)

Time:11-10

I have a datframe that looks like this

user_id product_id  created_at
    1   100         2019-04-21 20:20:00
    1   100         2019-04-23 00:10:00
    1   200         2019-05-24 10:00:00
    1   200         2020-06-24 10:10:24
    2   100         2019-01-22 21:10:00
    2   200         2019-04-25 20:23:30
    2   300         2021-01-21 10:20:00
    3   400         2019-12-21 10:20:00
    3   400         2021-04-21 10:20:00

I'm trying to find the most bought product_id for each user_id. I know I can use groupby along with value_counts or pd.Series.mode to get this. something like this:

df.groupby(['user_id'])['product_id',].apply(lambda x: x.value_counts().index[0]).reset_index()

This works fine until I reach a user_id having two or more product_id with the same count( like in the sample dataframe). When this happens, ideally I'd want to select the product_id with the most recent created_at. So for eg. for user_id 1, I'd want to select product_id 200 as it was last bought more recently.

What would be the most optimal way to achieve this?

CodePudding user response:

You can calculate both count and max on dates, then sort on these values and drop duplicates (or use groupby().head()):

s = df.groupby(['user_id','product_id'])['created_at'].agg(['count','max'])
s.sort_values(['count','max'], ascending=False).groupby('user_id').head(1)

Output:

                    count                  max
user_id product_id                            
3       400             2  2021-04-21 10:20:00
1       200             2  2020-06-24 10:10:24
2       300             1  2021-01-21 10:20:00

CodePudding user response:

df.\
sort_values("created_at", ascending=False).\
groupby(["user_id", "product_id"], sort=False, as_index=False).\
count().\
groupby("user_id", as_index=False).\
head(1)
#   user_id  product_id  created_at
# 0       1         200           2
# 1       2         300           1
# 2       3         400           2
  • Related