Say that I have a df
like this:
Value
0 True
1 True
2 False
3 False
4 False
5 True
6 True
7 False
8 True
9 True
And say that I want to assign each group of True
values a label, such that consecutive True
values are assigned the same label because they constitute a cluster, whereas False
values get always 0
:
Value Label
0 True 1
1 True 1
2 False 0
3 False 0
4 False 0
5 True 2
6 True 2
7 False 0
8 True 3
9 True 3
How could I do this in pandas?
CodePudding user response:
Try this:
>>> df['Label'] = df[df['Value']].index.to_series().diff().ne(1).cumsum()
>>> df
Value Label
0 True 1.0
1 True 1.0
2 False NaN
3 False NaN
4 False NaN
5 True 2.0
6 True 2.0
7 False NaN
8 True 3.0
9 True 3.0
>>>
CodePudding user response:
Here is another approach that is fully independent of the index:
m = df['Value']
df['Label'] = m.ne(m.shift()).cumsum().where(m)//2 df['Value'].iloc[0]
Explanation: if successive values are different, start a new group, keep only the True groups, divide the group number by two to account for the alternating True/False and correct the initial group number depending on whether the first item is False or True.