Home > database >  Group by sequence of True
Group by sequence of True

Time:09-21

I have the following df:

df = pd.DataFrame({"val_a":[True,True,False,False,False,True,False,False,True,True,True,True,False,True,True]})


val_a
0   True
1   True
2   False
3   False
4   False
5   True
6   False
7   False
8   True
9   True
10  True
11  True
12  False
13  True
14  True

and I wish to have the following result:

    val_a   tx
0   True     0
1   True     0
2   False    None
3   False    None
4   False    None
5   True     1
6   False    None
7   False    None
8   True     2
9   True     2
10  True     2
11  True     2
12  False    None
13  True     3
14  True     3  

explanation: When you see a True - count it as a group so for index 0 and 1 its the same tx (0) later comes only one True (index 5) so mark it as 1.

What have I tired: I know that cumsum and groupby must come into play here but couldnt figure how.

g  = (df['val_a']==True).cumsum()
df['tx'] = df.groupby(g).ffill()

CodePudding user response:

Identify the groups with cumsum then filter the rows having True values and use factorize to assign the ordinal number to each unique group

m = df['val_a']
df.loc[m, 'tx'] = (~m).cumsum()[m].factorize()[0]

Alternatively you can also use groupby ngroup

m = df['val_a']
df['tx'] = m[m].groupby((~m).cumsum()).ngroup()

    val_a   tx
0    True  0.0
1    True  0.0
2   False  NaN
3   False  NaN
4   False  NaN
5    True  1.0
6   False  NaN
7   False  NaN
8    True  2.0
9    True  2.0
10   True  2.0
11   True  2.0
12  False  NaN
13   True  3.0
14   True  3.0
  • Related