Home > Blockchain >  How to ignore the zero value to calculate the mean in the dataframe
How to ignore the zero value to calculate the mean in the dataframe

Time:04-05

import matplotlib.pyplot as plt
import pandas
import statistics
import numpy as np

names = ['place', 'age', 'NoMale', 'NoFemale', 'DNoMale', 'DNoFemale']
df = pandas.read_csv('data.csv', names=names)
SData = df.groupby('place')['NoMale', 'NoFemale', 'DNoMale', 'DNoFemale'].sum().reset_index().sort_values(by=['place'],axis=0,ascending=True, inplace=False)

SData["MaleFRates"] = SData['NoMale']/SData['DNoMale']
SData["FemaleFRates"] = SData['NoFemale']/SData['DNoFemale']
SData_sorted= SData.sort_values('MaleFRates', )

plt.barh(SData_sorted['place'],SData_sorted[ 'MaleFRates'])

print(SData)

mean = [SData_sorted[ 'MaleFRates'] != 0].SData_sorted[ 'MaleFRates'].mean()

print(mean)

I want to get the mean of the death rate in the different places but there is an zero value, how can i remove it

Get the mean of the death rate ignoring the zero

CodePudding user response:

I believe it also depends on the significance of '0' values.

  1. If they are literally '0', then
    df = SData_sorted[SData_sorted[ 'MaleFRates'] != 0]]
    print(df.mean())
    
  2. If they are values are not measured, you can replace '0's with NaN.
    df = SData_sorted.replace(0, np.NaN)
    print(df.mean())
    
    By default mean ignores NaN values

CodePudding user response:

mean = SData_sorted[SData_sorted[ 'MaleFRates'] != 0]['MaleFRates'].mean()

your syntax is incorrect...

CodePudding user response:

You could do it like this

import pandas as pd

data = [("A", 10), ("B", 0), ("C", 20)]
df = pd.DataFrame(data, columns=["Name", "Value"])
print(df)

sub_df = df.where(df["Value"] != 0)
print(sub_df)
print(sub_df.mean())

where the application of where and a condition are used to remove a 0 value from the data. So in the specific case that you mentioned,

SData_sorted.where(SData_sorted['MaleFRates'] != 0).mean()

should work.

CodePudding user response:

Why you need to ignore the zero value. It's not taken into account when the mean is calculated. But if you want to remove zeros from a specific column you can try

SData_sorted.loc[~(SData_sorted['MaleFRates'] == 0)]
  • Related