Home > Software design >  Python Pandas grouping into different columns
Python Pandas grouping into different columns

Time:09-13

I have the following dataframe:

d = {'ID':['X1','Y1','Z3','X1','X1','Z3','L1','H5','H5','H5'],'Prob':[0,0,1,1,1,1,0,0,0,0]}
frame = pd.DataFrame(d)

I am trying to count the number of times a 0 or 1 occurs within the prob column and then create new columns indicating the individual count:


ID     zero_count    one_count
X1     1             2
Y1     1             0
Z3     0             2
L1     1             0
H5     3             0

I have tried the following but not managed it so far:


frame.groupby(['ID','prob'])['index'].count().reset_index(name='zero_count')

Any ideas would be great.

CodePudding user response:

You need a crosstab:

out = (pd
   .crosstab(frame['ID'], frame['Prob'])
   .rename(columns={0: 'zero', 1: 'one'})
   .add_suffix('_count')
   .reset_index().rename_axis(columns=None)
)

variant with num2words for automatic conversion of numbers into words:

from num2words import num2words

out = (pd
   .crosstab(frame['ID'], frame['Prob'])
   .rename(columns=num2words)
   .add_suffix('_count')
   .reset_index().rename_axis(columns=None)
)

output:

   ID  zero_count  one_count
0  H5           3          0
1  L1           1          0
2  X1           1          2
3  Y1           1          0
4  Z3           0          2

CodePudding user response:

You could use the the function unstack to display your different groupby results in columns. Like this:

frame.groupby(['ID','Prob']).value_counts().unstack(level=1,fill_value=0)

Output:

Prob    0   1
ID      
H5      3   0
L1      1   0
X1      1   2
Y1      1   0
Z3      0   2
  • Related