I have a pandas dataframe with an id column called doc_ID
and a boolean column that reports if a certain value is below a threshold, like so:
df = pd.DataFrame({'doc_ID': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
'below_threshold': [False, False, False, False, True, False, False, True, False, False,
False, False, False, False, False, False, False, True, False, False]})
I'm trying to create a new grouping id within each doc_ID
that would extend from the first False
value in order until and including the first True
value. Something like this
doc_ID below_threshold new_group
0 1 False 1
1 1 False 1
2 1 False 1
3 1 False 1
4 1 True 1
5 1 False 2
6 2 False 3
7 2 True 3
8 2 False 4
9 2 False 4
10 2 False 4
11 2 False 4
12 3 False 5
13 3 False 5
14 3 False 5
15 3 False 5
16 3 False 5
17 3 True 5
18 3 False 6
19 3 False 6
CodePudding user response:
IIUC, use:
m1 = ~df['below_threshold']
m2 = df.groupby('doc_ID')['below_threshold'].shift(fill_value=True)
df['new_group'] = (m1&m2).cumsum()
Output:
doc_ID below_threshold new_group
0 1 False 1
1 1 False 1
2 1 False 1
3 1 False 1
4 1 True 1
5 1 False 2
6 2 False 3
7 2 True 3
8 2 False 4
9 2 False 4
10 2 False 4
11 2 False 4
12 3 False 5
13 3 False 5
14 3 False 5
15 3 False 5
16 3 False 5
17 3 True 5
18 3 False 6
19 3 False 6