I have the following dataframe:
Customer ID | Country | Is True |
---|---|---|
123 | China | 1 |
124 | China | 1 |
125 | Colombia | 0 |
126 | Bangladesh | 0 |
127 | Bangladesh | 1 |
128 | China | 0 |
129 | Colombia | 0 |
130 | Bangladesh | 0 |
131 | Bangladesh | 0 |
132 | China | 1 |
I want to create something like pivot that will give me:
Country | Count Customers | % of population | # True | True Ratio (of Country) |
---|---|---|---|---|
Bangladesh | 4 | 40% | 1 | 25% |
China | 4 | 40% | 3 | 75% |
Colombia | 2 | 20% | 0 | 0% |
Note: The second column is calculated on the column level- it's the ratio of the specific country out of the entire population (I have 4 customers from China out of 10, that's 40%). The true ratio is on the row level: within China I have 1 'True' which is 25% of all (4) China. I managed to add the count calculations:
Country = df.groupby('Country').agg(Count Customers = ('CustomerID', 'count'), TrueCount = ('is True','sum'))
How do I add the 2 ratios?
CodePudding user response:
Dataframe:
import pandas as pd
df = pd.DataFrame(
{
'Customer ID':range(123,133),
'Country':['China', 'China', 'Colombia', 'Bangladesh', 'Bangladesh', 'China', 'Colombia', 'Bangladesh', 'Bangladesh', 'China'],
'is True':[1,1,0,0,1,0,0,0,0,1]
}
)
First you can use the groupby to create your first two columns (Count Customers & #True). Then you can use these values to create the other two columns:
agg_df = df.groupby('Country').agg(
CountCustomers=('Customer ID', 'count'),
TrueCount=('is True','sum')
)
agg_df['% of population'] = agg_df['CountCustomers'] / agg_df['CountCustomers'].sum()
agg_df['True Ratio (of Country)'] = agg_df['TrueCount'] / agg_df['CountCustomers']
Result:
'CountCustomers' 'TrueCount' '% of population' 'True Ratio (of Country)'
Country
Bangladesh 4 1 0.4 0.25
China 4 3 0.4 0.75
Colombia 2 0 0.2 0.00