I have the following dataframe:
d = {'ID':['X1','Y1','Z3','X1','X1','Z3','L1','H5','H5','H5'],'Prob':[0,0,1,1,1,1,0,0,0,0]}
frame = pd.DataFrame(d)
I am trying to count the number of times a 0 or 1 occurs within the prob column and then create new columns indicating the individual count:
ID zero_count one_count
X1 1 2
Y1 1 0
Z3 0 2
L1 1 0
H5 3 0
I have tried the following but not managed it so far:
frame.groupby(['ID','prob'])['index'].count().reset_index(name='zero_count')
Any ideas would be great.
CodePudding user response:
You need a crosstab
:
out = (pd
.crosstab(frame['ID'], frame['Prob'])
.rename(columns={0: 'zero', 1: 'one'})
.add_suffix('_count')
.reset_index().rename_axis(columns=None)
)
variant with num2words
for automatic conversion of numbers into words:
from num2words import num2words
out = (pd
.crosstab(frame['ID'], frame['Prob'])
.rename(columns=num2words)
.add_suffix('_count')
.reset_index().rename_axis(columns=None)
)
output:
ID zero_count one_count
0 H5 3 0
1 L1 1 0
2 X1 1 2
3 Y1 1 0
4 Z3 0 2
CodePudding user response:
You could use the the function unstack
to display your different groupby results in columns. Like this:
frame.groupby(['ID','Prob']).value_counts().unstack(level=1,fill_value=0)
Output:
Prob 0 1
ID
H5 3 0
L1 1 0
X1 1 2
Y1 1 0
Z3 0 2