Home > Back-end >  Customized normalization of pd.crosstab()
Customized normalization of pd.crosstab()

Time:10-12

I am working with a pandas dataframe and using the following crosstab formula to tabulate results:

ct = pd.crosstab(index=[df['Gender'], df['Education'],df['MaritalStatus']],
                                 columns=df['month'], normalize='columns').round(2)

The output of the cross-tabulation:

Output

I don't want the percentages to be calculated by the whole column. Instead, for the first row for example, my desired output would be to calculate the percentage of Females who are Level0 and Single / Number of Females who are Level 0 (As highlighted in yellow).

What's the way to accomplish this?

CodePudding user response:

Use GroupBy.transform and divide original ouput (also removed normalize='columns'):

ct = pd.crosstab(index=[df['Gender'], df['Education'],df['MaritalStatus']],
                        columns=df['month'])

#normalize by levels Gender and Education
ct = ct.div(ct.groupby(level=['Gender','Education']).transform('sum')).round(2)
  • Related