Home > other >  Fill grouping variable pandas dataframe
Fill grouping variable pandas dataframe

Time:01-03

I have a pandas dataframe with an id column called doc_ID and a boolean column that reports if a certain value is below a threshold, like so:

df = pd.DataFrame({'doc_ID': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
                  'below_threshold': [False, False, False, False, True, False, False, True, False, False,
                                     False, False, False, False, False, False, False, True, False, False]}) 

I'm trying to create a new grouping id within each doc_ID that would extend from the first False value in order until and including the first True value. Something like this

    doc_ID  below_threshold  new_group
0        1            False          1
1        1            False          1
2        1            False          1
3        1            False          1
4        1             True          1
5        1            False          2
6        2            False          3
7        2             True          3
8        2            False          4
9        2            False          4
10       2            False          4
11       2            False          4
12       3            False          5
13       3            False          5
14       3            False          5
15       3            False          5
16       3            False          5
17       3             True          5
18       3            False          6
19       3            False          6

CodePudding user response:

IIUC, use:

m1 = ~df['below_threshold']
m2 = df.groupby('doc_ID')['below_threshold'].shift(fill_value=True)
df['new_group'] = (m1&m2).cumsum()

Output:

    doc_ID  below_threshold  new_group
0        1            False          1
1        1            False          1
2        1            False          1
3        1            False          1
4        1             True          1
5        1            False          2
6        2            False          3
7        2             True          3
8        2            False          4
9        2            False          4
10       2            False          4
11       2            False          4
12       3            False          5
13       3            False          5
14       3            False          5
15       3            False          5
16       3            False          5
17       3             True          5
18       3            False          6
19       3            False          6
  • Related