I would like to delete ID in any row that contains certain strings (A or D) within ID. Here is my data frame:
id time dx
1 1 C
1 2 B
2 1 A
2 2 C
2 3 B
3 1 D
I would like the following:
id time dx
1 1 C
1 2 B
Based on the earlier post regarding this (Delete rows containing specific strings in R), I tried d %>% filter(!grepl('A|D', dx)). However, it only deletes the rows that contain A or D, not the whole IDs. I'd appreciate any help!
##Update: All the below answers worked well for the above post. Thank you all! Note that for this post, I simplified my data frame, and later on, I realized that I actually needed R codes to delete the IDs with certain partial strings (e.g., A or B0) from the following data frame. I was able to achieve this by modifying the first r2evans' answer: d %>% group_by(id) %>% filter(!any(str_detect(dx, "A|B0"))) %>% ungroup(). I have included the note here in case someone needs it. I would appreciate any additional suggestions.
Data frame:
id time dx
1 1 C01
1 2 B1
2 1 A34
2 2 C01
2 3 B1
3 1 B01X
The results I wanted:
id time dx
1 1 C01
1 2 B1
CodePudding user response:
grep
is the wrong tool for this based on your question and sample data, I think %in%
is the better way to go. Combine that with natural dplyr:group_by
and an any(.)
conditional, and we get our results
dplyr
dat %>%
group_by(id) %>%
filter(!any(dx %in% c("A", "D"))) %>%
ungroup()
# # A tibble: 2 x 3
# id time dx
# <int> <int> <chr>
# 1 1 1 C
# 2 1 2 B
base R
dat[ave(dat$dx, dat$id, FUN = function(z) !any(z %in% c("A", "D"))) == "TRUE",]
# id time dx
# 1 1 1 C
# 2 1 2 B
(ave
requires that its output be the same class as its input which, in this case, is character
. That's why I'm comparing against the string "TRUE"
instead of using it as a literal TRUE
.)
Data
dat <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), time = c(1L, 2L, 1L, 2L, 3L, 1L), dx = c("C", "B", "A", "C", "B", "D")), class = "data.frame", row.names = c(NA, -6L))
CodePudding user response:
We may use subset
in base R
subset(df1, !id %in% id[dx %in% c("A", "D")])
id time dx
1 1 1 C
2 1 2 B
Or a similar option with filter
from dplyr
library(dplyr)
filter(df1, !id %in% id[dx %in% c("A", "D")])
id time dx
1 1 1 C
2 1 2 B
data
df1 <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), time = c(1L, 2L,
1L, 2L, 3L, 1L), dx = c("C", "B", "A", "C", "B", "D")),
class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
Another base R option using subset
ave
subset(
df,
!ave(dx %in% c("A", "D"), id, FUN = any)
)
gives
id time dx
1 1 1 C
2 1 2 B