I want to filter by ID where GRP is A or B or NA. If one ID contains both A B then B should be deleted.
Data:
DF<-tibble::tribble(
~ID, ~GRP,
1L, "A",
2L, "A",
2L, "B",
3L, "B",
3L, NA,
4L, "A",
4L, "A",
4L, NA
)
# A tibble: 8 × 2
ID GRP
<int> <chr>
1 1 A
2 2 A
3 2 B
4 3 B
5 3 NA
6 4 A
7 4 A
8 4 NA
Desired output:
A tibble: 7 × 2
ID GRP
<int> <chr>
1 1 A
2 2 A
3 3 B
4 3 NA
5 4 A
6 4 A
7 4 NA
Best regards, H
CodePudding user response:
We could construct a logic with n_distinct
after grouping by 'ID'
library(dplyr)
DF %>%
group_by(ID) %>%
filter(all(c("A", "B") %in% GRP) &
GRP != "B"|
(is.na(GRP)|n_distinct(GRP, na.rm = TRUE) == 1)) %>%
ungroup
-output
# A tibble: 7 × 2
ID GRP
<int> <chr>
1 1 A
2 2 A
3 3 B
4 3 <NA>
5 4 A
6 4 A
7 4 <NA>
Or create the logic with the first
non-NA element
DF %>%
group_by(ID) %>%
filter(GRP %in% first(na.omit(GRP)) | is.na(GRP)) %>%
ungroup
# A tibble: 7 × 2
ID GRP
<int> <chr>
1 1 A
2 2 A
3 3 B
4 3 <NA>
5 4 A
6 4 A
7 4 <NA>
Or may use two n_distinct
DF %>%
group_by(ID) %>%
filter(n_distinct(GRP, na.rm = TRUE) == 2 & GRP != "B"|
n_distinct(GRP, na.rm = TRUE) == 1) %>%
ungroup
Or using base R
i1 <- with(DF, ID %in% names(which(rowSums(table(ID, GRP) > 0) == 2)))
subset(DF, GRP == 'A' & i1|!i1)