Home > Software engineering >  How to aggregate in pandas with some conditions?
How to aggregate in pandas with some conditions?

Time:07-09

I want to aggregate my data in this way:

df.groupby('date').agg({ 'user_id','nunique',
                         'user_id':'nunique' ONLY WHERE purchase_flag==1})

date     | user_id | purchase_flag
4-1-2020 | 1       | 1
4-1-2020 | 1       | 1 (purchased second time but still same unique user on that day)
4-1-2020 | 2       | 0

In this case I want the output to looks like:

date     | total_users | total_users_who_purchased
4-1-2020 | 2           | 1

How can I best achieve this?

CodePudding user response:

Try this by creating helper column in your dataframe to indicate users who purchased first then groupby and aggregate on that helper column:

df["user_id_purchased"] = df["user_id"].where(df["purchase_flag"].astype(bool))
df_output = df.groupby("date", as_index=False).agg(
    total_users=("user_id", "nunique"),
    total_users_who_purchased=("user_id_purchased", "nunique"),
)

Output:

       date  total_users  total_users_who_purchased
0  4-1-2020            2                          1

CodePudding user response:

How are you? I think that one way to achieve this goal is using .loc

df.loc[ (df["purchase_flag"]==1)].user_id.nunique

Implementation to get your output

details = { 'date' : ['4-1-2020'],
            'total_users' : df.user_id.nunique(),
            'total_users_who_purchased' : 
             df.loc(df["purchase_flag"]==1)].user_id.nunique()}



df2 = pd.DataFrame(details)
  
df2
  • Related