I have following data
Sample_ID<-c("a1_01_01","a2_03_03","a3_07_07","a4_09_09","a5_10_10","a6_21_21")
Sex<-c(M, M, F, F, M, NM)
DF1<-data.frame(Sample_ID,Sex)
I want to subset above df in the basis of following list.
Excluded <-c(a1_01, a3_07, a5_10)
I am using this code to execute
Newdf<-subset(DF1,Sample_ID %in% Excluded)
but its not working as you can see the Excluded
and Sample_ID
are not exactly same but their initials are similar. I have an idea that i need to pipe my R command with grep
but couldnot figure out how to. Can someone please help me or if there is any other better simple way.
CodePudding user response:
You can do:
DF1[!grepl(paste(Excluded, collapse = "|"), DF1$Sample_ID),]
#> Sample_ID Sex
#> 2 a2_03_03 M
#> 4 a4_09_09 F
#> 6 a6_21_21 NM
This works by creating a regex that looks for any of the strings in Example
and excludes them with logical negation and subsetting.
CodePudding user response:
Another possible solution, tidyverse
-based:
library(tidyverse)
Sample_ID <- c("a1_01_01","a2_03_03","a3_07_07","a4_09_09","a5_10_10","a6_21_21")
Sex<-c("M", "M", "F", "F", "M", "M")
DF1<-data.frame(Sample_ID,Sex)
Excluded <-c("a1_01", "a3_07", "a5_10")
DF1 %>%
filter(!str_remove(Sample_ID, "_\\d{2}$") %in% Excluded)
#> Sample_ID Sex
#> 1 a2_03_03 M
#> 2 a4_09_09 F
#> 3 a6_21_21 M