Home > Net >  groupby and then count identical consecutive values in pandas
groupby and then count identical consecutive values in pandas

Time:11-09

I would like to count the number of times the current value has not changed in a dataframe over rolling periods. That should be done while at same time groupby on a third column.

Example

Input columns: 'group', 'col', output column: 'count'
df_out = pd.DataFrame({'group':list('ABABABABABABABAB'), 'col':list('aaaaaaaabbbbaabb'), 'count':list('0011223300110000')})

I need below solution extended with groupby on a third column ('group' in example above). groupby consecutive identical values in pandas dataframe and cumulative count of the number of occurences

df = pd.DataFrame({'group':list('ABABABABABABABAB'), 'col':list('aaaaaaaabbbbaabb')})

df['count'] = (df.groupby(df['col'].ne(df['col'].shift()).cumsum()).cumcount() )

df['count_wanted'] = list('0011223300110000')

Counting shall be done separately for each of the values in column "group"

CodePudding user response:

Try this, using a "helper" column to group by based on the change in 'col' from row to row:

colgrp = (df['col'] != df['col'].shift()).cumsum()
df['col_gotten'] = df.groupby([colgrp, 'group']).cumcount()
df

Output:

   group col  count count_wanted  col_gotten
0      A   a      0            0           0
1      B   a      1            0           0
2      A   a      2            1           1
3      B   a      3            1           1
4      A   a      4            2           2
5      B   a      5            2           2
6      A   a      6            3           3
7      B   a      7            3           3
8      A   b      0            0           0
9      B   b      1            0           0
10     A   b      2            1           1
11     B   b      3            1           1
12     A   a      0            0           0
13     B   a      1            0           0
14     A   b      0            0           0
15     B   b      1            0           0
  • Related