I am having a similar dataframe like below.
group_name field_A
0 g1 3
1 g1 49
2 g1 90
3 g2 78
4 g2 92
5 g2 76
6 g2 48
7 g3 33
8 g3 44
9 g3 55
I want to process above dataframe and check group wise values and if the value of field_A is above 75%tile of the all group values then add 'Yes' in a new column which is 'isAboveThreshold'. so it will look like
group_name field_A top isTop75
0 g1 3 69.5 No
1 g1 49 69.5 No
2 g1 90 69.5 Yes
3 g2 78 81.5 No
4 g2 92 81.5 Yes
5 g2 76 81.5 No
6 g2 48 81.5 No
7 g3 33 49.5 No
8 g3 44 49.5 No
9 g3 55 49.5 Yes
CodePudding user response:
Use GroupBy.transform
with lambda function and Series.quantile
, compare by Series.ge
and pass to numpy.where
:
s = df.groupby('group_name')['field_A'].transform(lambda x: x.quantile(.75))
df['isTop75'] = np.where(df['field_A'].ge(s), 'Yes', 'No')
print(df)
group_name field_A isTop75
0 g1 3 No
1 g1 49 No
2 g1 90 Yes
3 g2 78 No
4 g2 92 Yes
5 g2 76 No
6 g2 48 No
7 g3 33 No
8 g3 44 No
9 g3 55 Yes
CodePudding user response:
Here is a useful function for that, which can be passed to transform
or agg
after groupby
:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'group_name' : ['g1', 'g1', 'g1', 'g2', 'g2', 'g2', 'g2', 'g3', 'g3', 'g3'],
'field_A': [3, 49, 90, 78, 92, 76, 48, 33, 44, 55]})
def percentile(n):
"""Percentile function that can be passed to pandas agg() function
AND changes column name.
"""
def percentile_(x):
return x.quantile(n)
percentile_.__name__ = 'q%s' % n
return percentile_
df['top'] = df.groupby('group_name').transform(percentile(.75))
df['isTop75'] = np.where(df.field_A >= df.top, 'Yes', 'No')
print(df)
output:
group_name field_A top isTop75
0 g1 3 69.5 No
1 g1 49 69.5 No
2 g1 90 69.5 Yes
3 g2 78 81.5 No
4 g2 92 81.5 Yes
5 g2 76 81.5 No
6 g2 48 81.5 No
7 g3 33 49.5 No
8 g3 44 49.5 No
9 g3 55 49.5 Yes
CodePudding user response:
There is a quantile
method in pandas. Apply it after grouping your df by group_name
:
new_df = pd.DataFrame()
for _, group in df.groupby("group_name"):
group["top"] = group["field_A"].quantile(q=0.75)
group["isTop75"] = group.apply(lambda row: "Yes" if row["field_A"] > row["top"] else "No", axis=1)
new_df = pd.concat([new_df, group])
print(new_df)
Output:
group_name field_A top isTop75
0 g1 3 69.5 No
1 g1 49 69.5 No
2 g1 90 69.5 Yes
3 g2 78 81.5 No
4 g2 92 81.5 Yes
5 g2 76 81.5 No
6 g2 48 81.5 No
7 g3 33 49.5 No
8 g3 44 49.5 No
9 g3 55 49.5 Yes
Edit: You can get get rid of the loop by using transform
:
df["top"] = df.groupby("group_name")["field_A"].transform(lambda x: x.quantile(.75))
df["isTop75"] = df.apply(lambda row: "Yes" if row["field_A"] > row["top"] else "No", axis=1)
print(df)