Groupby streak of numbers and a mask-CodePudding

This is my pandas dataframe:

df = pd.DataFrame({'a': [10, 20, 1, 55, 66, 333, 444, 1, 2, 10], 'b': [1,1, 1, -1, -1, -1, -1, 1, 1, -1]})

And this is the way that I need it after using groupby. I want all of 1s in b and two -1 after the streak of 1s. For example the first group is all of the consecutive 1s and then after the streak ends I want two -1s. If the streak of -1 is less than two, just gives the first -1 which is group two in the example:

I know that I need a mask. I have tried some of them but didn't work. These are some of my tries:

df.groupby(df.b.diff().cumsum().eq(1))
df.groupby(df['b'].ne(df['b'].shift()).cumsum())

CodePudding user response：

The logic is not fully clear.

Assuming you want to reset the group when "b" shifts from -1 to 1. And let's add a condition on "a" for the demo:

# restart group on -1 -> 1
group = df['b'].diff().gt(0).cumsum()
# drop rows with a≥100
m = df['a'].lt(100)

for k,g in df[m].groupby(group):
    print(g)

Output:

update

m = df['b'].eq(1)
group = (m&m.ne(m.shift())).cumsum()
n = m.groupby(group).sum()

df.groupby(group, group_keys=False).apply(lambda g: g.head(n[g.name] 2))

Output:

as a for loop:

m = df['b'].eq(1)
group = (m&m.ne(m.shift())).cumsum()
n = m.groupby(group).sum()

for k, g in df.groupby(group, group_keys=False):
    print(k)
    print(g.head(n[k] 2))

Output: