Home > OS >  Flagging outliers in dataframe, creating a new column in pandas
Flagging outliers in dataframe, creating a new column in pandas

Time:09-24

I see this question and want to update it slightly.

data = {'Group':['A', 'A', 'A'], 'Age':[18, 200, 17]} 
df = pd.DataFrame(data) 

I want to create a new column 'Outlier`, where the outlier will be flagged as true or false based on 3 standard deviations.

My desired output

data = {'Group':['A', 'A', 'A'], 'Age':[18, 200, 17], 'Outlier':['False', 'True', 'False']} 
df = pd.DataFrame(data) 
df

CodePudding user response:

Try with groupby and transform:

zscores = df.groupby('Group').transform(lambda x: (x - x.mean()) / x.std())
df["Outlier"] = zscores.abs()>3

>>> df
  Group  Age  Outlier
0     A   18    False
1     A  200    False
2     A   17    False

To get outliers regardless of the group, use:

zscores = (df["Age"]-df["Age"].mean())/df["Age"].std()
df["Outlier"] = zscores.abs()>3
  • Related