Home > front end >  Create a counter that iterates over a column in a dataframe, and counts when a condition in the colu
Create a counter that iterates over a column in a dataframe, and counts when a condition in the colu

Time:07-15

I currently have a column in my dataframe called step, that I want to use to set a counter on. It contains a bunch of repeating numbers. I want to create a new column against this, that has a counter that increments when a certain condition is met. The condition is when the number changes for a fourth time in the column step, the counter will increment by 1, and then repeat the process. Here is an example of my code, and what I'd like to acheive:

df = pd.DataFrame({"step": [1,1,1,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,7,7,7,7,8,8,8,8,8,9,9,9,9,7,7,
7,8,8,8,9,9,7,7,8,8,8,9,9,9,7]})

df['counter'] = df['step'].cumsum() #This will increment when it sees a fourth different number, and repeat

So ideally, my output would look like this:

print(df['step'])

[1,1,1,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,7,7,7,7,8,8,8,8,8,9,9,9,9,7,7,
7,8,8,8,9,9,7,7,8,8,8,9,9,9,7,7]

print(df['counter'])
[0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,
3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5]

The numbers in step will vary, but the counter will always increment when the fourth different value in the sequence is identified and reset the counter. I know I could probably do this with if statements, but my dataframe is large and I would rather do it in a faster way of comparison, if possible. Any help would be greatly appreciated!

CodePudding user response:

You can convert your step column into categories and then count on the category codes:

import pandas as pd

df = pd.DataFrame({"step": [1,1,1,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,7,7,7,7,8,8,8,8,8,9,9,9,9,10]})
df["counter"] = df.step.astype("category").values.codes // 3

Result:

    step  counter
0      1        0
1      1        0
2      1        0
3      2        0
4      2        0
5      2        0
6      2        0
7      3        0
8      3        0
9      3        0
10     4        1
11     4        1
12     4        1
13     5        1
14     5        1
15     5        1
16     5        1
17     6        1
18     6        1
19     6        1
20     7        2
21     7        2
22     7        2
23     7        2
24     8        2
25     8        2
26     8        2
27     8        2
28     8        2
29     9        2
30     9        2
31     9        2
32     9        2
33    10        3

Update for changed data (see comment):
df = pd.DataFrame({"step": [1,1,1,2,2,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,7,7,7,7,8,8,8,8,8,9,9,9,9,7,7,7,8,8,8,9,9,7,7,8,8,8,9,9,9,7,7]})
df['counter'] = (df.step.diff().fillna(0).ne(0).cumsum() // 3).astype(int)

    step  counter
0      1        0
1      1        0
2      1        0
3      2        0
4      2        0
5      2        0
6      2        0
7      3        0
8      3        0
9      3        0
10     4        1
11     4        1
12     4        1
13     5        1
14     5        1
15     5        1
16     5        1
17     6        1
18     6        1
19     6        1
20     7        2
21     7        2
22     7        2
23     7        2
24     8        2
25     8        2
26     8        2
27     8        2
28     8        2
29     9        2
30     9        2
31     9        2
32     9        2
33     7        3
34     7        3
35     7        3
36     8        3
37     8        3
38     8        3
39     9        3
40     9        3
41     7        4
42     7        4
43     8        4
44     8        4
45     8        4
46     9        4
47     9        4
48     9        4
49     7        5
50     7        5

CodePudding user response:

Compare the current and previous row in step column to identify boundaries(location of transitions), then use cumsum to assign number to groups of rows and floor divide by 3 to create counter

m = df.step != df.step.shift()
df['counter'] = (m.cumsum() - 1) // 3

    step  counter
0      1        0
1      1        0
2      1        0
3      2        0
4      2        0
5      2        0
6      2        0
7      3        0
8      3        0
9      3        0
10     4        1
11     4        1
12     4        1
13     5        1
14     5        1
15     5        1
16     5        1
17     6        1
18     6        1
19     6        1
20     7        2
21     7        2
22     7        2
23     7        2
24     8        2
25     8        2
26     8        2
27     8        2
28     8        2
29     9        2
30     9        2
31     9        2
32     9        2
33     7        3
34     7        3
35     7        3
36     8        3
37     8        3
38     8        3
39     9        3
40     9        3
41     7        4
42     7        4
43     8        4
44     8        4
45     8        4
46     9        4
47     9        4
48     9        4
49     7        5
  • Related