Finding the mean of a column; but excluding a singular value-CodePudding

Imagine I have a dataset that is like so:

         ID           birthyear  weight
0        619040       1962       0.1231231
1        600161       1963       0.981742
2      25602033       1963       1.3123124     
3        624870       1987       10,000

and I want to get the mean of the column weight, but the obvious 10,000 is hindering the actual mean. In this situation I cannot change the value but must work around it, this is what I've got so far, but obviously it's including that last value.

avg_num_items = df_cleaned['trans_quantity'].mean()

translist = df_cleaned['trans_quantity'].tolist()

my dataframe is df_cleaned and the column I'm actually working with is 'trans_quantity' so how do I go about the mean while working around that value?

CodePudding user response：

Since you added SQL in your tags, In SQL you'd want to exclude it in the WHERE clause:

SELECT AVG(trans_quantity)
FROM your_data_base
WHERE trans_quantity <>  10,000

In Pandas:

avg_num_items = df_cleaned[df_cleaned["trans_quantity"] != 10000]["trans_quantity"].mean()

You can also replace your value with a NAN and skip it in the mean:

avg_num_items =  df_cleaned["trans_quantity"].replace(10000, np.nan).mean(skipna=True)

CodePudding user response：

With pandas, ensure you have numeric data (10,000 is a string), filter the values above threshold and use the mean:

(pd.to_numeric(df['weight'], errors='coerce')
   .loc[lambda x: x<10000]
   .mean()
)

output: 0.8057258333333334