I'd like to filter the rows with a difference less than 20 after groupby.
Here is my original table
A B Value
1 1 0
1 2 10
1 2 20
1 2 25
2 1 0
2 1 15
2 1 100
After 'df.groupby(['A', 'B'])', it would be
A B Value
1 1 0
1 2 10
2 20
2 25
2 1 0
1 15
1 100
I would like to filter those rows with a difference less than 20.
For example,
Group[A=1][B=1]row1 got no previous row, so filter out.
Group[A=1][B=2]row1 got no previous row, so filter out.
Group[A=1][B=2]row2 got a difference of 10, so keep it.
Group[A=2][B=1]row3 got a difference of 100, so filter out.
The results I expect:
A B Value
1 2 20
1 2 25
2 1 15
Here is what I tried:
df.groupby(['A', 'B']).filter(lambda x : (x['Value'] - x['Value'].shift(1) < 20).any())
But it does not work.
Any advice would be greatly appreciated
CodePudding user response:
Doing diff
out = df[df.groupby(['A','B']).Value.diff().lt(20)]
Out[398]:
A B Value
2 1 2 20
3 1 2 25
5 2 1 15
CodePudding user response:
cond1 = df.groupby(['A', 'B'])['Value'].transform(lambda x: x.diff(1).lt(20))
df[cond1]
df[cond1]
A B Value
2 1 2 20
3 1 2 25
5 2 1 15