Home > Mobile >  pandas groupby agg get max from one column and bring along value from another column
pandas groupby agg get max from one column and bring along value from another column

Time:10-05

I have a dataframe of purchases:

product_id    count    timestamp           customer_id
1             1        2021-10-04 10:20    a
1             3        2021-10-04 10:21    b
2             4        2021-10-04 10:00    c
1             2        2021-10-03 10:00    c

I use the following groupby and agg to create a report of the sum and mean of the count, and the latest purchase timestamp.

report = (
    df.groupby(product_id).agg(
        sum=pd.NamedAgg(column="count", aggfunc="sum"),
        mean_count=pd.NamedAgg(column="count", aggfunc="mean"),
        latest_purchase_time=pd.NamedAgg(column="timestamp", aggfunc="max")
    )
)

I want to include in this report the customer_id that corresponds to the latest purchase timestamp. Is there a way to do this?

e.g.:

product_id    sum    mean_count    latest_purchase_timestamp   *customer_id*
1             6      2             2021-10-04 10:21            b
2             4      4             2021-10-04 10:00            c

CodePudding user response:

First convert customer_id to index, so possible get value by maximal timestamp by idxmax:

report = (
    df.set_index('customer_id')
      .groupby('product_id').agg(
        sum=pd.NamedAgg(column="count", aggfunc="sum"),
        mean_count=pd.NamedAgg(column="count", aggfunc="mean"),
        latest_purchase_time=pd.NamedAgg(column="timestamp", aggfunc="max"),
        customer_id=pd.NamedAgg(column="timestamp", aggfunc="idxmax")
    )
)
print (report)
            sum  mean_count latest_purchase_time customer_id
product_id                                                  
1             6           2  2021-10-04 10:21:00           b
2             4           4  2021-10-04 10:00:00           c
  • Related