I have the following data frame called test
:
> test
concat grpRnk
1 1.1 1
2 1.2 1
3 2.1 3
4 2.1 2
5 2.2 3
6 2.2 2
7 3.1 4
8 3.2 4
And I run this bit of dplyr code test %>% distinct(concat, .keep_all = TRUE)
to get the following output, showing the unique rows in the concat
column:
> test %>% distinct(concat, .keep_all = TRUE)
concat grpRnk
1 1.1 1
2 1.2 1
3 2.1 3
4 2.2 3
5 3.1 4
6 3.2 4
How do I modify this bit of code to instead remove rows numbers 3 and 5 in the original test
data frame where grpRnk
was 3 for both? The current bit of code removed those dupes where grpRnk
= 2. In base R is fine too!
Here's the code for generating test
data frame:
test <- data.frame(concat = c(1.1,1.2,2.1,2.1,2.2,2.2,3.1,3.2),
grpRnk = c(1,1,3,2,3,2,4,4))
CodePudding user response:
Obviously, the first case is kept in each case. Therefore you should sort the corresponding variable before.
test %>%
arrange(grpRnk) %>%
distinct(concat, .keep_all = TRUE)
If, as you write, it depends on other columns' values, it might be safer to take an intermediate step and create a new variable that shows all multiple cases. This way you have more control and you can delete the cases in a seperate step.
test %>%
mutate(dup = duplicated(concat, fromLast = TRUE) | duplicated(concat))