I would like to count the number of times the current value has not changed in a dataframe over rolling periods. That should be done while at same time groupby on a third column.
Example
Input columns: 'group', 'col', output column: 'count'
df_out = pd.DataFrame({'group':list('ABABABABABABABAB'), 'col':list('aaaaaaaabbbbaabb'), 'count':list('0011223300110000')})
I need below solution extended with groupby on a third column ('group' in example above). groupby consecutive identical values in pandas dataframe and cumulative count of the number of occurences
df = pd.DataFrame({'group':list('ABABABABABABABAB'), 'col':list('aaaaaaaabbbbaabb')})
df['count'] = (df.groupby(df['col'].ne(df['col'].shift()).cumsum()).cumcount() )
df['count_wanted'] = list('0011223300110000')
Counting shall be done separately for each of the values in column "group"
CodePudding user response:
Try this, using a "helper" column to group by based on the change in 'col' from row to row:
colgrp = (df['col'] != df['col'].shift()).cumsum()
df['col_gotten'] = df.groupby([colgrp, 'group']).cumcount()
df
Output:
group col count count_wanted col_gotten
0 A a 0 0 0
1 B a 1 0 0
2 A a 2 1 1
3 B a 3 1 1
4 A a 4 2 2
5 B a 5 2 2
6 A a 6 3 3
7 B a 7 3 3
8 A b 0 0 0
9 B b 1 0 0
10 A b 2 1 1
11 B b 3 1 1
12 A a 0 0 0
13 B a 1 0 0
14 A b 0 0 0
15 B b 1 0 0