I have dataframe like this:
point | switch_on |
---|---|
A | True |
B | True |
C | True |
A | False |
D | False |
N | True |
O | False |
O | False |
S | False |
I want to add another column which will determine groups of rows that contain the same boolean before switching. Like this:
point | switch_on | group |
---|---|---|
A | True | 1 |
B | True | 1 |
C | True | 1 |
A | False | 2 |
D | False | 2 |
N | True | 3 |
O | False | 4 |
O | False | 4 |
S | False | 4 |
CodePudding user response:
Thank you for your question! This can be achieved using the Pandas apply
method with a class for maintaining the current count and the value of the last read group (in this case a boolean value). In my solution, the class GroupCounter
has a count
method, which accepts a boolean value, and it then either returns the current count or increments the count based on the value of the switch_on
row being passed in. The resulting values are then assigned to a new column called group
. See below:
import pandas as pd
df = pd.DataFrame.from_dict({
"point": ["A", "B", "C", "A", "D", "N", "O", "O", "S"],
"switch_on": [True, True, True, False, False, True, False, False, False]
})
class GroupCounter:
group_number = 1
current_group = None
def count(self, value: bool):
# value will only be None if no rows have been read yet
# this assumes no dataframe rows themselves have the value of None!!!
if self.current_group is None:
self.current_group = value
# if the incoming value is different than that of the previous one
# then increment the group number and capture the new group value
if value != self.current_group:
self.group_number = 1
self.current_group = value
return self.group_number
counter = GroupCounter()
df['group'] = df['switch_on'].apply(counter.count)
print(df.head(10))
Outputs:
point switch_on group
0 A True 1
1 B True 1
2 C True 1
3 A False 2
4 D False 2
5 N True 3
6 O False 4
7 O False 4
8 S False 4
CodePudding user response:
Identify every time there's a change:
- This is when the current value
df.switch_on
doesn't equal the previous valuedf.switch_on.shift()
.
Then take the cumsum
:
True
is interpreted as1
, andFalse
as0
.
df['group'] = df.switch_on.ne(df.switch_on.shift()).cumsum()
print(df)
Output:
point switch_on group
0 A True 1
1 B True 1
2 C True 1
3 A False 2
4 D False 2
5 N True 3
6 O False 4
7 O False 4
8 S False 4