import matplotlib.pyplot as plt
import pandas
import statistics
import numpy as np
names = ['place', 'age', 'NoMale', 'NoFemale', 'DNoMale', 'DNoFemale']
df = pandas.read_csv('data.csv', names=names)
SData = df.groupby('place')['NoMale', 'NoFemale', 'DNoMale', 'DNoFemale'].sum().reset_index().sort_values(by=['place'],axis=0,ascending=True, inplace=False)
SData["MaleFRates"] = SData['NoMale']/SData['DNoMale']
SData["FemaleFRates"] = SData['NoFemale']/SData['DNoFemale']
SData_sorted= SData.sort_values('MaleFRates', )
plt.barh(SData_sorted['place'],SData_sorted[ 'MaleFRates'])
print(SData)
mean = [SData_sorted[ 'MaleFRates'] != 0].SData_sorted[ 'MaleFRates'].mean()
print(mean)
I want to get the mean of the death rate in the different places but there is an zero value, how can i remove it
Get the mean of the death rate ignoring the zero
CodePudding user response:
I believe it also depends on the significance of '0' values.
- If they are literally '0', then
df = SData_sorted[SData_sorted[ 'MaleFRates'] != 0]] print(df.mean())
- If they are values are not measured, you can replace '0's with
NaN
.
By defaultdf = SData_sorted.replace(0, np.NaN) print(df.mean())
mean
ignoresNaN
values
CodePudding user response:
mean = SData_sorted[SData_sorted[ 'MaleFRates'] != 0]['MaleFRates'].mean()
your syntax is incorrect...
CodePudding user response:
You could do it like this
import pandas as pd
data = [("A", 10), ("B", 0), ("C", 20)]
df = pd.DataFrame(data, columns=["Name", "Value"])
print(df)
sub_df = df.where(df["Value"] != 0)
print(sub_df)
print(sub_df.mean())
where the application of where
and a condition are used to remove a 0
value from the data. So in the specific case that you mentioned,
SData_sorted.where(SData_sorted['MaleFRates'] != 0).mean()
should work.
CodePudding user response:
Why you need to ignore the zero value. It's not taken into account when the mean is calculated. But if you want to remove zeros from a specific column you can try
SData_sorted.loc[~(SData_sorted['MaleFRates'] == 0)]