Create a counter that iterates over a column in a dataframe, and counts when a condition in the colu-CodePudding

I currently have a column in my dataframe called step, that I want to use to set a counter on. It contains a bunch of repeating numbers. I want to create a new column against this, that has a counter that increments when a certain condition is met. The condition is when the number changes for a fourth time in the column step, the counter will increment by 1, and then repeat the process. Here is an example of my code, and what I'd like to acheive:

df = pd.DataFrame({"step": [1,1,1,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,7,7,7,7,8,8,8,8,8,9,9,9,9,7,7,
7,8,8,8,9,9,7,7,8,8,8,9,9,9,7]})

df['counter'] = df['step'].cumsum() #This will increment when it sees a fourth different number, and repeat

So ideally, my output would look like this:

print(df['step'])

[1,1,1,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,7,7,7,7,8,8,8,8,8,9,9,9,9,7,7,
7,8,8,8,9,9,7,7,8,8,8,9,9,9,7,7]

print(df['counter'])
[0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,
3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5]

The numbers in step will vary, but the counter will always increment when the fourth different value in the sequence is identified and reset the counter. I know I could probably do this with if statements, but my dataframe is large and I would rather do it in a faster way of comparison, if possible. Any help would be greatly appreciated!

CodePudding user response：

You can convert your step column into categories and then count on the category codes:

import pandas as pd

df = pd.DataFrame({"step": [1,1,1,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,7,7,7,7,8,8,8,8,8,9,9,9,9,10]})
df["counter"] = df.step.astype("category").values.codes // 3

Result:

    step  counter
0      1        0
1      1        0
2      1        0
3      2        0
4      2        0
5      2        0
6      2        0
7      3        0
8      3        0
9      3        0
10     4        1
11     4        1
12     4        1
13     5        1
14     5        1
15     5        1
16     5        1
17     6        1
18     6        1
19     6        1
20     7        2
21     7        2
22     7        2
23     7        2
24     8        2
25     8        2
26     8        2
27     8        2
28     8        2
29     9        2
30     9        2
31     9        2
32     9        2
33    10        3

Update for changed data (see comment):

df = pd.DataFrame({"step": [1,1,1,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,7,7,7,7,8,8,8,8,8,9,9,9,9,7,7,7,8,8,8,9,9,7,7,8,8,8,9,9,9,7,7]})
df['counter'] = (df.step.diff().fillna(0).ne(0).cumsum() // 3).astype(int)

    step  counter
0      1        0
1      1        0
2      1        0
3      2        0
4      2        0
5      2        0
6      2        0
7      3        0
8      3        0
9      3        0
10     4        1
11     4        1
12     4        1
13     5        1
14     5        1
15     5        1
16     5        1
17     6        1
18     6        1
19     6        1
20     7        2
21     7        2
22     7        2
23     7        2
24     8        2
25     8        2
26     8        2
27     8        2
28     8        2
29     9        2
30     9        2
31     9        2
32     9        2
33     7        3
34     7        3
35     7        3
36     8        3
37     8        3
38     8        3
39     9        3
40     9        3
41     7        4
42     7        4
43     8        4
44     8        4
45     8        4
46     9        4
47     9        4
48     9        4
49     7        5
50     7        5

CodePudding user response：

Compare the current and previous row in step column to identify boundaries(location of transitions), then use cumsum to assign number to groups of rows and floor divide by 3 to create counter

m = df.step != df.step.shift()
df['counter'] = (m.cumsum() - 1) // 3

    step  counter
0      1        0
1      1        0
2      1        0
3      2        0
4      2        0
5      2        0
6      2        0
7      3        0
8      3        0
9      3        0
10     4        1
11     4        1
12     4        1
13     5        1
14     5        1
15     5        1
16     5        1
17     6        1
18     6        1
19     6        1
20     7        2
21     7        2
22     7        2
23     7        2
24     8        2
25     8        2
26     8        2
27     8        2
28     8        2
29     9        2
30     9        2
31     9        2
32     9        2
33     7        3
34     7        3
35     7        3
36     8        3
37     8        3
38     8        3
39     9        3
40     9        3
41     7        4
42     7        4
43     8        4
44     8        4
45     8        4
46     9        4
47     9        4
48     9        4
49     7        5