Home > Blockchain >  Mark groups of rows in dataframe based on boolean sequence using pandas approach
Mark groups of rows in dataframe based on boolean sequence using pandas approach

Time:08-06

I have dataframe like this:

point switch_on
A True
B True
C True
A False
D False
N True
O False
O False
S False

I want to add another column which will determine groups of rows that contain the same boolean before switching. Like this:

point switch_on group
A True 1
B True 1
C True 1
A False 2
D False 2
N True 3
O False 4
O False 4
S False 4

CodePudding user response:

Thank you for your question! This can be achieved using the Pandas apply method with a class for maintaining the current count and the value of the last read group (in this case a boolean value). In my solution, the class GroupCounter has a count method, which accepts a boolean value, and it then either returns the current count or increments the count based on the value of the switch_on row being passed in. The resulting values are then assigned to a new column called group. See below:

import pandas as pd

df = pd.DataFrame.from_dict({
    "point": ["A", "B", "C", "A", "D", "N", "O", "O", "S"],
    "switch_on": [True, True, True, False, False, True, False, False, False]
})


class GroupCounter:
    group_number = 1
    current_group = None

    def count(self, value: bool):
        # value will only be None if no rows have been read yet
        # this assumes no dataframe rows themselves have the value of None!!!
        if self.current_group is None:
            self.current_group = value

        # if the incoming value is different than that of the previous one
        # then increment the group number and capture the new group value
        if value != self.current_group:
            self.group_number  = 1
            self.current_group = value

        return self.group_number


counter = GroupCounter()

df['group'] = df['switch_on'].apply(counter.count)

print(df.head(10))

Outputs:

  point  switch_on  group
0     A       True      1
1     B       True      1
2     C       True      1
3     A      False      2
4     D      False      2
5     N       True      3
6     O      False      4
7     O      False      4
8     S      False      4

CodePudding user response:

Identify every time there's a change:

  • This is when the current value df.switch_on doesn't equal the previous value df.switch_on.shift().

Then take the cumsum:

  • True is interpreted as 1, and False as 0.
df['group'] = df.switch_on.ne(df.switch_on.shift()).cumsum()
print(df)

Output:

  point  switch_on  group
0     A       True      1
1     B       True      1
2     C       True      1
3     A      False      2
4     D      False      2
5     N       True      3
6     O      False      4
7     O      False      4
8     S      False      4
  • Related