If I want to remove values that do not exist between -2σ and 2σ, how do I remove outliers using iqr?
I implemented this equation as follows.
iqr = df['abc'].percentile(0.75) - df['abc'].percentile(0.25)
cond1 = (df['abc'] > df['abc'].percentile(0.75) 2 * iqr)
cond2 = (df['abc'] < df['abc'].percentile(0.25) - 2 * iqr)
df[cond1 & cond2]
Is this the right way?
CodePudding user response:
This is not right. Your iqr
is almost never equal to σ. Quartiles and deviations are not the same things.
Fortunately, you can easily compute the standard deviation of a numerical Series using Series.std()
.
sigma = df['abc'].std()
cond1 = (df['abc'] > df['abc'].mean() - 2 * sigma)
cond2 = (df['abc'] < df['abc'].mean() 2 * sigma)
df[cond1 & cond2]