Home > Mobile >  Deleting rows that are duplicated in one column based on value in another column
Deleting rows that are duplicated in one column based on value in another column

Time:06-20

A similar question was asked here. However, I did not manage to adopt that solution to my particular problem, hence the separate question.

An example dataset:


  id group
1  1   5
2  1 998
3  2   2
4  2   3
5  3 998

I would like to delete all rows that are duplicated in id and where group has value 998. In this example, only row 2 should be deleted.

I tried something along those lines:

df1 <- df %>%
  subset((unique(by = "id") |  group != 998))

but got

Error in is.factor(x) : Argument "x" is missing, with no default

Thank you in advance

CodePudding user response:

Here is an idea

library(dplyr)

df %>% 
 group_by(id) %>% 
 filter(!any(n() > 1 & group == 998))

# A tibble: 3 x 2
# Groups:   id [2]
     id group
  <int> <int>
1     2     2
2     2     3
3     3   998

In case you want to remove only the 998 entry from the group then,

df %>% 
 group_by(id) %>% 
 filter(!(n() > 1 & group == 998))

CodePudding user response:

One way could be:

library(dplyr)

df1 <- df %>% 
  filter(duplicated(id) & group=="998") 

anti_join(df, df1)
Joining, by = c("id", "group")
  id group
1  1     5
3  2     2
4  2     3
5  3   998
  • Related