Mark groups of rows in dataframe based on boolean sequence using pandas approach-CodePudding

I have dataframe like this:

point	switch_on
A	True
B	True
C	True
A	False
D	False
N	True
O	False
O	False
S	False

I want to add another column which will determine groups of rows that contain the same boolean before switching. Like this:

point	switch_on	group
A	True	1
B	True	1
C	True	1
A	False	2
D	False	2
N	True	3
O	False	4
O	False	4
S	False	4

CodePudding user response：

Thank you for your question! This can be achieved using the Pandas apply method with a class for maintaining the current count and the value of the last read group (in this case a boolean value). In my solution, the class GroupCounter has a count method, which accepts a boolean value, and it then either returns the current count or increments the count based on the value of the switch_on row being passed in. The resulting values are then assigned to a new column called group. See below:

import pandas as pd

df = pd.DataFrame.from_dict({
    "point": ["A", "B", "C", "A", "D", "N", "O", "O", "S"],
    "switch_on": [True, True, True, False, False, True, False, False, False]
})


class GroupCounter:
    group_number = 1
    current_group = None

    def count(self, value: bool):
        # value will only be None if no rows have been read yet
        # this assumes no dataframe rows themselves have the value of None!!!
        if self.current_group is None:
            self.current_group = value

        # if the incoming value is different than that of the previous one
        # then increment the group number and capture the new group value
        if value != self.current_group:
            self.group_number  = 1
            self.current_group = value

        return self.group_number


counter = GroupCounter()

df['group'] = df['switch_on'].apply(counter.count)

print(df.head(10))

Outputs:

  point  switch_on  group
0     A       True      1
1     B       True      1
2     C       True      1
3     A      False      2
4     D      False      2
5     N       True      3
6     O      False      4
7     O      False      4
8     S      False      4

CodePudding user response：

Identify every time there's a change:

This is when the current value df.switch_on doesn't equal the previous value df.switch_on.shift().

Then take the cumsum:

True is interpreted as 1, and False as 0.

df['group'] = df.switch_on.ne(df.switch_on.shift()).cumsum()
print(df)

Output:

  point  switch_on  group
0     A       True      1
1     B       True      1
2     C       True      1
3     A      False      2
4     D      False      2
5     N       True      3
6     O      False      4
7     O      False      4
8     S      False      4