Let say that we have this dataframe:
d = {'col1': [1, 2,0,55,12], 'col2': [3, 4,44,34,46], 'col3': [A,A,B,B,A] }
df = pd.DataFrame(data=d)
df
col1 col2 col3
0 1 3 A
1 2 4 A
2 0 44 B
3 55 34 B
4 12 46 A
I want another column that would count the numbers of the A and B separately as follows:
col1 col2 col3 count
0 1 3 A 2
1 2 4 A 2
2 0 44 B 2
3 55 34 B 2
4 12 46 A 1
I have tried the group by, but it does not do what I want, could you please help please ?
CodePudding user response:
You can create consecutive groups by compare shifted values for not eqaual with cumulative sum and pass to GroupBy.transform
with GroupBy.size
:
g = df['col3'].ne(df['col3'].shift()).cumsum()
df['count'] = df.groupby(g)['col3'].transform('size')
print (df)
col1 col2 col3 count
0 1 3 A 2
1 2 4 A 2
2 0 44 B 2
3 55 34 B 2
4 12 46 A 1
Or alternative with Series.value_counts
and Series.map
:
s = df['col3'].ne(df['col3'].shift()).cumsum()
df['count'] = s.map(s.value_counts())
CodePudding user response:
import pandas as pd
d= {'col1': [1, 2,0,55,12], 'col2': [3, 4,44,34,46], 'col3':
["A","A","B","B","A"] }
df = pd.DataFrame(data=d)
len_of_col1=len(df.col1)
len_of_col2=len(df.col2)
len_of_col3=len(df.col3)
print(len_of_col1,len_of_col2,len_of_col3)