I have a dataset like below
ID value
1 10
2 15
3 18
4 30
5 35
I would like to keep all the rows that has value - value of the previous row <=5, so I do
df['diff'] = df.value.diff()
df = df[df.diff <= 5]
Then I will have
ID value diff
2 15 5
3 18 3
5 35 5
However, I don't want to keep row 3, because row 2 is kept due to row 1, and as row 1 and row 2 become a pair, row 3 should not be paired with row 2 anymore.
How could I do that using pandas? Indeed I can write a for loop but it is not the best idea.
CodePudding user response:
So you have the mask that checks if difference to previous row <= 5:
>>> d = df.value.diff().le(5)
>>> d
1 False
2 True
3 True
4 False
5 True
Rows marked with True
will be kept, but you don't want to keep a True row if the previous row was also True.
Then we can shift this mask, negate it and &
with the original to convert True
's that have True
in previous row into False
:
>>> d & ~d.shift(fill_value=False)
1 False
2 True
3 False
4 False
5 True
where fill_value
is needed otherwise there arises NaN and it "can't bitwise-negate float". Putting False
there has no effect other than silencing that issue.
Now we can select the rows from the dataframe with this resultant mask:
>>> wanted = d & ~d.shift(fill_value=False)
>>> df[wanted]
ID value
2 15
5 35