I have a pandas Dataframe with a column I would like to group them by pack of 3 rows and then increment an indice on each pack.
id protocol protocol_grp
1 ISD ISD1
2 ISD ISD1
3 ISD ISD1
4 IRQ IRQ1
5 IRQ IRQ1
6 IRQ IRQ1
7 IRQ IRQ2
8 IRQ IRQ2
9 IRQ IRQ2
10 IRQ IRQ3
11 ISD ISD2
12 ISD ISD2
13 ISD ISD2
14 ISD ISD3
15 IRQ IRQ3
16 IRQ IRQ3
17 IRQ IRQ4
The desired output is protocol_grp column. What I'd like to be able to do is each time I had 3 same protocols, I increment the indice by 1.
Hopes this make sense.
CodePudding user response:
You can use:
df['protocol_grp'] = df['protocol'] df.groupby('protocol').cumcount() \
.floordiv(3).add(1).astype(str)
print(df)
# Output
id protocol protocol_grp
0 1 ISD ISD1
1 2 ISD ISD1
2 3 ISD ISD1
3 4 IRQ IRQ1
4 5 IRQ IRQ1
5 6 IRQ IRQ1
6 7 IRQ IRQ2
7 8 IRQ IRQ2
8 9 IRQ IRQ2
9 10 IRQ IRQ3
10 11 ISD ISD2
11 12 ISD ISD2
12 13 ISD ISD2
13 14 ISD ISD3
14 15 IRQ IRQ3
15 16 IRQ IRQ3 # <- check this row
16 17 IRQ IRQ4
CodePudding user response:
Let us check cumcount
then get the divisor
df['protocol_grp'] = df['protocol'].add((df.groupby('protocol').cumcount()//3 1).astype(str))