Home > Net >  Detect and fix outliers in a pandas series
Detect and fix outliers in a pandas series

Time:09-16

I have pandas series with some outliers values. Here's some mock data:

df = pd.DataFrame({'col1': [1200, 400, 50, 75, 8, 9, 8, 7, 6, 5, 4, 6, 6, 8, 3, 6, 6, 7, 6]}) 

I'd like to substitute outliers i.e values that >= 3 standard deviation from mean with the mean value.

CodePudding user response:

Let's do:

thrs = df['col1'].mean()   3 * df['col1'].std()
df.loc[df['col1'] >= thrs, 'col1'] = df['col1'].mean()  

CodePudding user response:

std_dev = df["col1"].std()
mean = df["col1"].mean()
df["col1"] = np.where(df.col1 >= 3*std_dev, mean, df.col1)
  • Related