Home > Software engineering >  Groupby streak of numbers and a mask
Groupby streak of numbers and a mask

Time:01-17

This is my pandas dataframe:

df = pd.DataFrame({'a': [10, 20, 1, 55, 66, 333, 444, 1, 2, 10], 'b': [1,1, 1, -1, -1, -1, -1, 1, 1, -1]})

And this is the way that I need it after using groupby. I want all of 1s in b and two -1 after the streak of 1s. For example the first group is all of the consecutive 1s and then after the streak ends I want two -1s. If the streak of -1 is less than two, just gives the first -1 which is group two in the example:

     a  b
0   10  1
1   20  1
2    1  1
3   55 -1
4   66 -1

     a  b
7    1  1
8    2  1
9   10 -1

I know that I need a mask. I have tried some of them but didn't work. These are some of my tries:

df.groupby(df.b.diff().cumsum().eq(1))
df.groupby(df['b'].ne(df['b'].shift()).cumsum())

CodePudding user response:

The logic is not fully clear.

Assuming you want to reset the group when "b" shifts from -1 to 1. And let's add a condition on "a" for the demo:

# restart group on -1 -> 1
group = df['b'].diff().gt(0).cumsum()
# drop rows with a≥100
m = df['a'].lt(100)

for k,g in df[m].groupby(group):
    print(g)

Output:


    a  b
0  10  1
1  20  1
2   1  1
3  55 -1
4  66 -1
    a  b
7   1  1
8   2  1
9  10 -1

update

m = df['b'].eq(1)
group = (m&m.ne(m.shift())).cumsum()
n = m.groupby(group).sum()

df.groupby(group, group_keys=False).apply(lambda g: g.head(n[g.name] 2)) 

Output:


    a  b
0  10  1
1  20  1
2   1  1
3  55 -1
4  66 -1
7   1  1
8   2  1
9  10 -1

as a for loop:

m = df['b'].eq(1)
group = (m&m.ne(m.shift())).cumsum()
n = m.groupby(group).sum()

for k, g in df.groupby(group, group_keys=False):
    print(k)
    print(g.head(n[k] 2))

Output:

1
    a  b
0  10  1
1  20  1
2   1  1
3  55 -1
4  66 -1
2
    a  b
7   1  1
8   2  1
9  10 -1
  • Related