How to ignore the zero value to calculate the mean in the dataframe-CodePudding

import matplotlib.pyplot as plt
import pandas
import statistics
import numpy as np

names = ['place', 'age', 'NoMale', 'NoFemale', 'DNoMale', 'DNoFemale']
df = pandas.read_csv('data.csv', names=names)
SData = df.groupby('place')['NoMale', 'NoFemale', 'DNoMale', 'DNoFemale'].sum().reset_index().sort_values(by=['place'],axis=0,ascending=True, inplace=False)

SData["MaleFRates"] = SData['NoMale']/SData['DNoMale']
SData["FemaleFRates"] = SData['NoFemale']/SData['DNoFemale']
SData_sorted= SData.sort_values('MaleFRates', )

plt.barh(SData_sorted['place'],SData_sorted[ 'MaleFRates'])

print(SData)

mean = [SData_sorted[ 'MaleFRates'] != 0].SData_sorted[ 'MaleFRates'].mean()

print(mean)

I want to get the mean of the death rate in the different places but there is an zero value, how can i remove it

Get the mean of the death rate ignoring the zero

CodePudding user response：

I believe it also depends on the significance of '0' values.

If they are literally '0', then

df = SData_sorted[SData_sorted[ 'MaleFRates'] != 0]]
print(df.mean())

If they are values are not measured, you can replace '0's with NaN.
```
df = SData_sorted.replace(0, np.NaN)
print(df.mean())
```
By default mean ignores NaN values

CodePudding user response：

mean = SData_sorted[SData_sorted[ 'MaleFRates'] != 0]['MaleFRates'].mean()

your syntax is incorrect...

CodePudding user response：

You could do it like this

import pandas as pd

data = [("A", 10), ("B", 0), ("C", 20)]
df = pd.DataFrame(data, columns=["Name", "Value"])
print(df)

sub_df = df.where(df["Value"] != 0)
print(sub_df)
print(sub_df.mean())

where the application of where and a condition are used to remove a 0 value from the data. So in the specific case that you mentioned,

SData_sorted.where(SData_sorted['MaleFRates'] != 0).mean()

should work.

CodePudding user response：

Why you need to ignore the zero value. It's not taken into account when the mean is calculated. But if you want to remove zeros from a specific column you can try

SData_sorted.loc[~(SData_sorted['MaleFRates'] == 0)]