This is my pandas dataframe:
df = pd.DataFrame({'a': [10, 20, 1, 55, 66, 333, 444, 1, 2, 10], 'b': [1,1, 1, -1, -1, -1, -1, 1, 1, -1]})
And this is the way that I need it after using groupby
. I want all of 1s in b and two -1 after the streak of 1s. For example the first group is all of the consecutive 1s and then after the streak ends I want two -1s. If the streak of -1 is less than two, just gives the first -1 which is group two in the example:
a b
0 10 1
1 20 1
2 1 1
3 55 -1
4 66 -1
a b
7 1 1
8 2 1
9 10 -1
I know that I need a mask. I have tried some of them but didn't work. These are some of my tries:
df.groupby(df.b.diff().cumsum().eq(1))
df.groupby(df['b'].ne(df['b'].shift()).cumsum())
CodePudding user response:
The logic is not fully clear.
Assuming you want to reset the group when "b" shifts from -1 to 1. And let's add a condition on "a" for the demo:
# restart group on -1 -> 1
group = df['b'].diff().gt(0).cumsum()
# drop rows with a≥100
m = df['a'].lt(100)
for k,g in df[m].groupby(group):
print(g)
Output:
a b
0 10 1
1 20 1
2 1 1
3 55 -1
4 66 -1
a b
7 1 1
8 2 1
9 10 -1
update
m = df['b'].eq(1)
group = (m&m.ne(m.shift())).cumsum()
n = m.groupby(group).sum()
df.groupby(group, group_keys=False).apply(lambda g: g.head(n[g.name] 2))
Output:
a b
0 10 1
1 20 1
2 1 1
3 55 -1
4 66 -1
7 1 1
8 2 1
9 10 -1
as a for loop:
m = df['b'].eq(1)
group = (m&m.ne(m.shift())).cumsum()
n = m.groupby(group).sum()
for k, g in df.groupby(group, group_keys=False):
print(k)
print(g.head(n[k] 2))
Output:
1
a b
0 10 1
1 20 1
2 1 1
3 55 -1
4 66 -1
2
a b
7 1 1
8 2 1
9 10 -1