Home > Software engineering >  pandas group by with different calculations (row level and column level)
pandas group by with different calculations (row level and column level)

Time:10-31

I have the following dataframe:

Customer ID Country Is True
123 China 1
124 China 1
125 Colombia 0
126 Bangladesh 0
127 Bangladesh 1
128 China 0
129 Colombia 0
130 Bangladesh 0
131 Bangladesh 0
132 China 1

I want to create something like pivot that will give me:

Country Count Customers % of population # True True Ratio (of Country)
Bangladesh 4 40% 1 25%
China 4 40% 3 75%
Colombia 2 20% 0 0%

Note: The second column is calculated on the column level- it's the ratio of the specific country out of the entire population (I have 4 customers from China out of 10, that's 40%). The true ratio is on the row level: within China I have 1 'True' which is 25% of all (4) China. I managed to add the count calculations:

Country = df.groupby('Country').agg(Count Customers = ('CustomerID', 'count'), TrueCount = ('is True','sum'))

How do I add the 2 ratios?

CodePudding user response:

Dataframe:

import pandas as pd
df = pd.DataFrame(
    {
        'Customer ID':range(123,133),
        'Country':['China', 'China', 'Colombia', 'Bangladesh', 'Bangladesh', 'China', 'Colombia', 'Bangladesh', 'Bangladesh', 'China'],
        'is True':[1,1,0,0,1,0,0,0,0,1]
    }
)

First you can use the groupby to create your first two columns (Count Customers & #True). Then you can use these values to create the other two columns:

agg_df = df.groupby('Country').agg(
        CountCustomers=('Customer ID', 'count'),
        TrueCount=('is True','sum')
    )
agg_df['% of population'] = agg_df['CountCustomers'] / agg_df['CountCustomers'].sum()
agg_df['True Ratio (of Country)'] = agg_df['TrueCount'] / agg_df['CountCustomers']

Result:

            'CountCustomers'    'TrueCount' '% of population'   'True Ratio (of Country)'
Country             
Bangladesh  4                   1           0.4                 0.25
China       4                   3           0.4                 0.75
Colombia    2                   0           0.2                 0.00
  • Related