I see this question and want to update it slightly.
data = {'Group':['A', 'A', 'A'], 'Age':[18, 200, 17]}
df = pd.DataFrame(data)
I want to create a new column 'Outlier`, where the outlier will be flagged as true or false based on 3 standard deviations.
My desired output
data = {'Group':['A', 'A', 'A'], 'Age':[18, 200, 17], 'Outlier':['False', 'True', 'False']}
df = pd.DataFrame(data)
df
CodePudding user response:
Try with groupby
and transform
:
zscores = df.groupby('Group').transform(lambda x: (x - x.mean()) / x.std())
df["Outlier"] = zscores.abs()>3
>>> df
Group Age Outlier
0 A 18 False
1 A 200 False
2 A 17 False
To get outliers regardless of the group, use:
zscores = (df["Age"]-df["Age"].mean())/df["Age"].std()
df["Outlier"] = zscores.abs()>3