I want to count the number of lines for each different groups-CodePudding

Let say that we have this dataframe:

d = {'col1': [1, 2,0,55,12], 'col2': [3, 4,44,34,46], 'col3': [A,A,B,B,A] } 
df = pd.DataFrame(data=d)
df

     col1  col2    col3
0     1     3      A
1     2     4      A
2     0     44     B
3     55    34     B
4     12    46     A

I want another column that would count the numbers of the A and B separately as follows:

     col1  col2    col3  count
0     1     3      A     2 
1     2     4      A     2
2     0     44     B     2
3     55    34     B     2
4     12    46     A     1

I have tried the group by, but it does not do what I want, could you please help please ?

CodePudding user response：

You can create consecutive groups by compare shifted values for not eqaual with cumulative sum and pass to GroupBy.transform with GroupBy.size:

g = df['col3'].ne(df['col3'].shift()).cumsum()
df['count'] = df.groupby(g)['col3'].transform('size')
print (df)
   col1  col2 col3  count
0     1     3    A      2
1     2     4    A      2
2     0    44    B      2
3    55    34    B      2
4    12    46    A      1

Or alternative with Series.value_counts and Series.map:

s = df['col3'].ne(df['col3'].shift()).cumsum()
df['count'] = s.map(s.value_counts())

CodePudding user response：

import pandas as pd

d= {'col1': [1, 2,0,55,12], 'col2': [3, 4,44,34,46], 'col3': 
["A","A","B","B","A"] }
df = pd.DataFrame(data=d)
len_of_col1=len(df.col1)
len_of_col2=len(df.col2)
len_of_col3=len(df.col3)
print(len_of_col1,len_of_col2,len_of_col3)