I have a panda data frame and I would like to create a new column which indicates with a boolean the maxmium whether another column value is the maximum in a given group.
Let's say I have a list of purchases
and their amounts
for each customer:
import pandas as pd
df = (pd.DataFrame(
[
('A', "29.08.2022", 100),
('A', "30.08.2022", 200),
('A', "31.08.2022", 300),
('B', "27.08.2022", 50),
('B', "38.08.2022", 1000),
('B', "30.08.2022", 10),
],
columns = ["customer_id", "purchase_date", "amount"])
)
df
customer_id purchase_date amount
A 29.08.2022 100
A 30.08.2022 200
A 31.08.2022 300
B 27.08.2022 50
B 38.08.2022 1000
B 30.08.2022 10
I can find the maximum using
df.groupby('customer_id')['amount'].max()
which gives me the max per each customer:
customer_id
A 300
B 1000
but I would like to flag this max in my original data in a new colum is_max
like this
customer_id purchase_date amoun is_max
A 29.08.2022 100 false
A 30.08.2022 200 false
A 31.08.2022 300 true
B 27.08.2022 50 false
B 38.08.2022 1000 true
B 30.08.2022 10 false
How to do it?
CodePudding user response:
Use comparison with groupby.transform('max')
:
df['is_max'] = df['amount'].eq(df.groupby('customer_id')['amount'].transform('max'))
Output:
customer_id purchase_date amount is_max
0 A 29.08.2022 100 False
1 A 30.08.2022 200 False
2 A 31.08.2022 300 True
3 B 27.08.2022 50 False
4 B 38.08.2022 1000 True
5 B 30.08.2022 10 False