I'm a beginner python coder, I want to build a python function that calculate a specific indicator,
as example, the data is look like:
ID status Age Gender
01 healthy 16 Male
02 un_healthy 14 Female
03 un_healthy 22 Male
04 healthy 12 Female
05 healthy 33 Female
To build a function that calculate the percentage of healthy people by healthy un_health
def health_rate(healthy, un_healthy,age){
if (age >= 15):
if (gender == "Male"):
return rateMale= (count(healthy)/count(healthy) count(un_healthy))
Else
return rateFemale= (count(healthy)/count(healthy) count(un_healthy))
Else
return print("underage");
and then just use .apply
but the logic isn't right, I still not get my desired output I want to return Male rate and Female rate
CodePudding user response:
You could use pivot_table (df
your dataframe):
df = df[df.Age >= 15].pivot_table(
index="status", columns="Gender", values="ID",
aggfunc="count", margins=True, fill_value=0
)
Result for your sample dataframe:
Gender Female Male All
status
healthy 1 1 2
un_healthy 0 1 1
All 1 2 3
If you want percentages:
df = (df / df.loc["All", :] * 100).drop("All")
Result:
Gender Female Male All
status
healthy 100.0 50.0 66.666667
un_healthy 0.0 50.0 33.333333
CodePudding user response:
df[col_name].value_counts(normalize=True)
gives you the proportions for the desired column. Here's how you can parameterize it:
def health_percentages(df, col_name):
return df[col_name].value_counts(normalize=True)*100
Example:
data = [ [1, 'healthy',16,'M'], [2, 'un_healthy',14,'F'], [3, 'un_healthy', 22, 'M'],[4, 'healthy', 12, 'F'],[5, 'healthy', 33, 'F']]
df = pd.DataFrame(data, columns = ['ID','status', 'Age', 'Gender'])
print(df)
print(health_percentages(df, 'status'))
#output:
ID status Age Gender
0 1 healthy 16 M
1 2 un_healthy 14 F
2 3 un_healthy 22 M
3 4 healthy 12 F
4 5 healthy 33 F
healthy 60.0
un_healthy 40.0