Home > Software engineering >  How to mark max values of groups in a Pandas DataFrame?
How to mark max values of groups in a Pandas DataFrame?

Time:09-01

I have a panda data frame and I would like to create a new column which indicates with a boolean the maxmium whether another column value is the maximum in a given group.

Let's say I have a list of purchases and their amounts for each customer:

import pandas as pd

df = (pd.DataFrame(
        [
            ('A', "29.08.2022", 100),
            ('A', "30.08.2022", 200),
            ('A', "31.08.2022", 300),
            ('B', "27.08.2022", 50),
            ('B', "38.08.2022", 1000),
            ('B', "30.08.2022", 10),

        ], 
        columns = ["customer_id", "purchase_date", "amount"])
     )

df

customer_id     purchase_date   amount
A               29.08.2022      100
A               30.08.2022      200
A               31.08.2022      300
B               27.08.2022      50
B               38.08.2022      1000
B               30.08.2022      10

I can find the maximum using

df.groupby('customer_id')['amount'].max()

which gives me the max per each customer:

customer_id
A     300
B    1000

but I would like to flag this max in my original data in a new colum is_max like this

customer_id     purchase_date   amoun   is_max
A               29.08.2022      100     false
A               30.08.2022      200     false
A               31.08.2022      300     true
B               27.08.2022      50      false
B               38.08.2022      1000    true
B               30.08.2022      10      false

How to do it?

CodePudding user response:

Use comparison with groupby.transform('max'):

df['is_max'] = df['amount'].eq(df.groupby('customer_id')['amount'].transform('max'))

Output:

  customer_id purchase_date  amount  is_max
0           A    29.08.2022     100   False
1           A    30.08.2022     200   False
2           A    31.08.2022     300    True
3           B    27.08.2022      50   False
4           B    38.08.2022    1000    True
5           B    30.08.2022      10   False
  • Related