A similar question was asked here. However, I did not manage to adopt that solution to my particular problem, hence the separate question.
An example dataset:
id group
1 1 5
2 1 998
3 2 2
4 2 3
5 3 998
I would like to delete all rows that are duplicated in id and where group has value 998. In this example, only row 2 should be deleted.
I tried something along those lines:
df1 <- df %>%
subset((unique(by = "id") | group != 998))
but got
Error in is.factor(x) : Argument "x" is missing, with no default
Thank you in advance
CodePudding user response:
Here is an idea
library(dplyr)
df %>%
group_by(id) %>%
filter(!any(n() > 1 & group == 998))
# A tibble: 3 x 2
# Groups: id [2]
id group
<int> <int>
1 2 2
2 2 3
3 3 998
In case you want to remove only the 998 entry from the group then,
df %>%
group_by(id) %>%
filter(!(n() > 1 & group == 998))
CodePudding user response:
One way could be:
library(dplyr)
df1 <- df %>%
filter(duplicated(id) & group=="998")
anti_join(df, df1)
Joining, by = c("id", "group")
id group
1 1 5
3 2 2
4 2 3
5 3 998