I am trying to filter a data frame using dplyr and I can't really think of a way to achieve what I want. I have a data frame of the following form:
A B C
-----------
1 2 5
1 4 6
2 2 7
2 4 6
Each value in column A
appears exactly 2 times. Column B
has exactly 2 different values, each appearing exactly once for each value of A
. Column C
can have any positive values. I want to keep all rows where for one value of A
, the row with the bigger B
value has a smaller C
value than the row with the smaller B
value. In the example above, this would result in:
A B C
-----------
2 2 7
2 4 6
Is there a way to achieve this using dplyr?
CodePudding user response:
1) Sort by A and B to ensure that the larger B is always the second within A and then grouping by A use a filter based on diff(C) < 0.
library(dplyr)
DF %>%
arrange(A, B) %>%
group_by(A) %>%
filter((diff(C) < 0)) %>%
ungroup
## # A tibble: 2 × 3
## A B C
## <int> <int> <int>
## 1 2 2 7
## 2 2 4 6
2) Another possibility is to ensure that the maximum of B is on the same row as the minimum of C. This would also work with non-numeric data.
See comments below this answer for another idea along these lines.
DF %>%
group_by(A) %>%
filter(which.max(B) == which.min(C)) %>%
ungroup
3) If the slope of B with respect to C is negative then keep the group.
DF %>%
group_by(A) %>%
filter(coef(lm(B ~ C))[[2]] < 0) %>%
ungroup
or we can calculate the slope ourselves:
DF %>%
group_by(A) %>%
filter(diff(C) / diff(B) < 0) %>%
ungroup
Note
Lines <- "A B C
1 2 5
1 4 6
2 2 7
2 4 6"
DF <- read.table(text = Lines, header = TRUE)