Add number to groups by count of values in Pandas Dataframe-CodePudding

I have a pandas Dataframe with a column I would like to group them by pack of 3 rows and then increment an indice on each pack.

id      protocol    protocol_grp
1       ISD     ISD1
2       ISD     ISD1
3       ISD     ISD1
4       IRQ     IRQ1
5       IRQ     IRQ1
6       IRQ     IRQ1
7       IRQ     IRQ2
8       IRQ     IRQ2
9       IRQ     IRQ2
10      IRQ     IRQ3
11      ISD     ISD2
12      ISD     ISD2
13      ISD     ISD2
14      ISD     ISD3
15      IRQ     IRQ3
16      IRQ     IRQ3
17      IRQ     IRQ4

The desired output is protocol_grp column. What I'd like to be able to do is each time I had 3 same protocols, I increment the indice by 1.

Hopes this make sense.

CodePudding user response：

You can use:

df['protocol_grp'] = df['protocol']   df.groupby('protocol').cumcount() \
                                        .floordiv(3).add(1).astype(str)
print(df)

# Output
    id protocol protocol_grp
0    1      ISD         ISD1
1    2      ISD         ISD1
2    3      ISD         ISD1
3    4      IRQ         IRQ1
4    5      IRQ         IRQ1
5    6      IRQ         IRQ1
6    7      IRQ         IRQ2
7    8      IRQ         IRQ2
8    9      IRQ         IRQ2
9   10      IRQ         IRQ3
10  11      ISD         ISD2
11  12      ISD         ISD2
12  13      ISD         ISD2
13  14      ISD         ISD3
14  15      IRQ         IRQ3
15  16      IRQ         IRQ3  # <- check this row
16  17      IRQ         IRQ4

CodePudding user response：

Let us check cumcount then get the divisor

df['protocol_grp'] = df['protocol'].add((df.groupby('protocol').cumcount()//3 1).astype(str))