This is my dataframe:
df = pd.DataFrame({'a': [20, 1, 55, 333, 444, 1, 2, 10], 'b': [20, 20, 21, 21, 21, 22, 22, 22]})
I want to group them by column b
and two rows after each group.
This is the output that I need:
a b
0 20 20
1 1 20
2 55 21
3 333 21
a b
2 55 21
3 333 21
4 444 21
5 1 22
6 2 22
a b
5 1 22
6 2 22
7 10 22
I know that I need a mask. I have tried some of them but didn't work. This is one of my tries:
df.groupby(df.b.diff().cumsum().eq(1))
CodePudding user response:
You cannot achieve this just with a grouper as groups cannot overlap.
You can however use a groupby with side effects by referencing the original DataFrame:
(df.groupby('b')
.apply(lambda g: df.loc[g.index[0]:g.index[-1] 2])
)
Output:
a b
b
20 0 20 20
1 1 20
2 55 21
3 333 21
21 2 55 21
3 333 21
4 444 21
5 1 22
6 2 22
22 5 1 22
6 2 22
7 10 22
As a loop:
for k, g in df.groupby('b'):
print(k)
print(df.loc[g.index[0]:g.index[-1] 2])
Output:
20
a b
0 20 20
1 1 20
2 55 21
3 333 21
21
a b
2 55 21
3 333 21
4 444 21
5 1 22
6 2 22
22
a b
5 1 22
6 2 22
7 10 22