Home > database >  Remove not representative values from a list
Remove not representative values from a list

Time:03-08

I'm wondering how to remove not representative samples from the population.

Let's assume that I have following numbers as a list:

pop = [1, 2, 3, 4, 1, 3, 3, 4, 5, 7, 2, 1, 1, 1, 1, 1, 1002]

Is there any way to inspect the list and remove value 1002 as it is completely not representative and can disturb calculations?

To illustrate this let make the following computation:

>>> pop = [1, 2, 3, 4, 1, 3, 3, 4, 5, 7, 2, 1, 1, 1, 1, 1, 1002]
>>> mean(pop)
61.294117647058826
>>> pop1 = [1, 2, 3, 4, 1, 3, 3, 4, 5, 7, 2, 1, 1, 1, 1, 1]      
>>> mean(pop1)
2.5

CodePudding user response:

data = {"pop": [1, 2, 3, 4, 1,3,3,4,5,7,2,1,1,1,1,1,1002]}

df = pd.DataFrame(data)

q3 = np.quantile(df["pop"], 0.75)
q1 = np.quantile(df["pop"], 0.25)

iqr = q3 - q1
upper_bound = q3   1.5 * iqr
lower_bound = q1 - 1.5 * iqr

df_wo_outliers = df[(df["pop"] >= lower_bound) & (df["pop"] <= upper_bound)]

But be aware that it could also remove 7 for instance as your upper bound is 5.5

df_wo_outliers["pop"].mean()
# 2.5
  • Related