I had an earlier post regarding how to delete ID if any of the rows within ID contain certain strings (e.g., A or D) from the following data frame in a longitudinal format. These are R code examples that I received from the earlier post in order:
- dat %>% group_by(id) %>% filter(!any(dx %in% c("A", "D"))) %>% ungroup()
- filter(df1, !id %in% id[dx %in% c("A", "D")])
- subset(df, !ave(dx %in% c("A", "D"), id, FUN = any)).
While these all worked well, I realized that I had to remove more than 600 strings (e.g., A, D, E2, F112, G203, etc), so I created a csv file for the list of these strings without a column name. 1. Is it the right approach to make a list? 2. How should I modify the above R codes if I intend to use the file of the strings list? Although I reviewed the other post or Google search results, I could not figure out what to do with my case. I would appreciate any suggestions!
Data frame:
id time dx
1 1 C
1 2 B
2 1 A
2 2 B
3 1 D
4 1 G203
4 2 E1
The results I want:
id time dx
1 1 C
1 2 B
CodePudding user response:
This is a good strategy:
Put your values in a vector or list here my_list
then
filter the dx
column by negating by !
and using %in%
operator:
library(dplyr)
my_list <- c("A", "D")
df %>%
filter(!dx %in% my_list)
id time dx
1 1 1 C
2 1 2 B
3 2 3 B
4 4 1 G203
5 4 1 E1
Expanding the list of values: my_list <- c("A", "D", "G203", "E1")
gives with the same code:
library(dplyr)
df %>%
filter(!dx %in% my_list)
id time dx
1 1 1 C
2 1 2 B
3 2 3 B