how to structure a python function that take input from data frame to calculate specific indicator-CodePudding

I'm a beginner python coder, I want to build a python function that calculate a specific indicator,

as example, the data is look like:

ID    status        Age    Gender
01    healthy       16     Male
02    un_healthy    14     Female
03    un_healthy    22     Male
04    healthy       12     Female
05    healthy       33     Female

To build a function that calculate the percentage of healthy people by healthy un_health

def health_rate(healthy, un_healthy,age){
    if (age >= 15):
        if (gender == "Male"):
            return rateMale= (count(healthy)/count(healthy) count(un_healthy))
        Else
            return rateFemale= (count(healthy)/count(healthy) count(un_healthy))
    Else 
        return print("underage");

and then just use .apply

but the logic isn't right, I still not get my desired output I want to return Male rate and Female rate

CodePudding user response：

You could use pivot_table (df your dataframe):

df = df[df.Age >= 15].pivot_table(
    index="status", columns="Gender", values="ID",
    aggfunc="count", margins=True, fill_value=0
)

Result for your sample dataframe:

Gender      Female  Male  All
status                       
healthy          1     1    2
un_healthy       0     1    1
All              1     2    3

If you want percentages:

df = (df / df.loc["All", :] * 100).drop("All")

Result:

Gender      Female  Male        All
status                             
healthy      100.0  50.0  66.666667
un_healthy     0.0  50.0  33.333333

CodePudding user response：

df[col_name].value_counts(normalize=True) gives you the proportions for the desired column. Here's how you can parameterize it:

def health_percentages(df, col_name):
    return df[col_name].value_counts(normalize=True)*100

Example:

data = [ [1, 'healthy',16,'M'], [2, 'un_healthy',14,'F'], [3, 'un_healthy', 22, 'M'],[4, 'healthy', 12, 'F'],[5, 'healthy', 33, 'F']]

df = pd.DataFrame(data, columns = ['ID','status', 'Age', 'Gender'])
print(df)
print(health_percentages(df, 'status'))

#output:
   ID      status  Age Gender
0   1     healthy   16      M
1   2  un_healthy   14      F
2   3  un_healthy   22      M
3   4     healthy   12      F
4   5     healthy   33      F

healthy       60.0
un_healthy    40.0