I want to get the porcentage for each row from the subgroup total. First, I created bins for the dataframe, then I do the group by.
I have the following dataframe:
data = pd.DataFrame({'barrio':['Almagro','Palermo','Almagro','Almagro','Palermo','Palermo','Almagro','Almagro'],
'fuente':['A','A','B','B','B','A','B','A'],
'valor':[1,6,5,3,5,10,8,4]})
bins = np.arange(0,data['valor'].max() 1, 2)
labels = ['-'.join(map(str,(x,y))) for x, y in zip(bins[:-1], bins[1:])]
data['bins'] = pd.cut(data['valor'], bins=bins,labels = labels, include_lowest=True)
data = data.groupby(['barrio','fuente','bins']).agg({'valor':'count'})
data = data.groupby(level=0).apply(lambda x: 100 * x / float(x.sum()))
The expected result would be:
valor
barrio fuente bins
Almagro A 0-2 50.00
2-4 50.00
4-6 0.00
6-8 0.00
8-10 0.00
B 0-2 0.00
2-4 33.00
4-6 33.00
6-8 33.00
8-10 0.00
Palermo A 0-2 0.00
2-4 0.00
4-6 50.00
6-8 0.00
8-10 50.00
B 0-2 0.00
2-4 0.00
4-6 100.00
6-8 0.00
8-10 0.00
But the result I get is:
valor
barrio fuente bins
Almagro A 0-2 20.00
2-4 20.00
4-6 0.00
6-8 0.00
8-10 0.00
B 0-2 0.00
2-4 20.00
4-6 20.00
6-8 20.00
8-10 0.00
Palermo A 0-2 0.00
2-4 0.00
4-6 33.33
6-8 0.00
8-10 33.33
B 0-2 0.00
2-4 0.00
4-6 33.33
6-8 0.00
8-10 0.00
How I can solve this problem?
thanks!!!
CodePudding user response:
You can use value_counts
with normalize
:
data.groupby(['barrio', 'fuente'])['bins'].value_counts(normalize=True)
Output:
barrio fuente bins
Almagro A 0-2 0.500000
2-4 0.500000
B 2-4 0.333333
4-6 0.333333
6-8 0.333333
Palermo A 4-6 0.500000
8-10 0.500000
B 4-6 1.000000
Name: bins, dtype: float64
Or you can use crosstab
, which uses the above in the background:
pd.crosstab([data['barrio'],data['fuente']],
data['bins'], normalize='index'
).stack()
Output:
barrio fuente bins
Almagro A 0-2 0.500000
2-4 0.500000
4-6 0.000000
6-8 0.000000
8-10 0.000000
B 0-2 0.000000
2-4 0.333333
4-6 0.333333
6-8 0.333333
8-10 0.000000
Palermo A 0-2 0.000000
2-4 0.000000
4-6 0.500000
6-8 0.000000
8-10 0.500000
B 0-2 0.000000
2-4 0.000000
4-6 1.000000
6-8 0.000000
8-10 0.000000
dtype: float64