I have pandas series with some outliers values. Here's some mock data:
df = pd.DataFrame({'col1': [1200, 400, 50, 75, 8, 9, 8, 7, 6, 5, 4, 6, 6, 8, 3, 6, 6, 7, 6]})
I'd like to substitute outliers i.e values that >= 3 standard deviation from mean with the mean value.
CodePudding user response:
Let's do:
thrs = df['col1'].mean() 3 * df['col1'].std()
df.loc[df['col1'] >= thrs, 'col1'] = df['col1'].mean()
CodePudding user response:
std_dev = df["col1"].std()
mean = df["col1"].mean()
df["col1"] = np.where(df.col1 >= 3*std_dev, mean, df.col1)