I want to remove rows that have a similar 'y' value - that is if y is within /- 100 of another value.
input = data.frame(x=c(100,200,100,300,200,100,200), y=c(800,850,900,100,901,701,699))
Input:
x | y |
---|---|
100 | 800 |
200 | 850 |
100 | 900 |
300 | 100 |
100 | 701 |
200 | 699 |
Output:
x | y |
---|---|
100 | 800 |
300 | 100 |
200 | 901 |
200 | 699 |
I can't quite workout how to do this in R, as I'm either deleting or leaving in values I don't want.
I think I might need to structure it as a set of pairwise comparisons. I'd sort by Y, then compare row 2 to row 1 and dropping, and if row 2 is outside of the range, treat that as my new comparison value and continue down the list.
But I don't know how to structure that - and I might be completely wrong, so open to suggestions!
CodePudding user response:
Something like this?:
library(dplyr)
input %>%
filter(!(y >= first(y) - 100 & y <= first(y) 100) | y== first(y))
x y
1 100 800
2 300 100
3 200 901
4 200 699