Home > Net >  How do I get rid of abnormalities from Pandas?
How do I get rid of abnormalities from Pandas?

Time:04-26

If I want to remove values that do not exist between -2σ and 2σ, how do I remove outliers using iqr?

I implemented this equation as follows.

iqr = df['abc'].percentile(0.75) - df['abc'].percentile(0.25)

cond1 = (df['abc'] > df['abc'].percentile(0.75)   2 * iqr)
cond2 = (df['abc'] < df['abc'].percentile(0.25) - 2 * iqr)

df[cond1 & cond2]

Is this the right way?

CodePudding user response:

This is not right. Your iqr is almost never equal to σ. Quartiles and deviations are not the same things.

Fortunately, you can easily compute the standard deviation of a numerical Series using Series.std().

sigma = df['abc'].std()

cond1 = (df['abc'] > df['abc'].mean() - 2 * sigma)
cond2 = (df['abc'] < df['abc'].mean()   2 * sigma)

df[cond1 & cond2]
  • Related