Home > OS >  Subsetting dataframe with grep
Subsetting dataframe with grep

Time:03-07

I have following data

Sample_ID<-c("a1_01_01","a2_03_03","a3_07_07","a4_09_09","a5_10_10","a6_21_21")
Sex<-c(M, M, F, F, M, NM)
DF1<-data.frame(Sample_ID,Sex)

I want to subset above df in the basis of following list.

Excluded <-c(a1_01, a3_07, a5_10)

I am using this code to execute

Newdf<-subset(DF1,Sample_ID %in% Excluded)

but its not working as you can see the Excluded and Sample_ID are not exactly same but their initials are similar. I have an idea that i need to pipe my R command with grep but couldnot figure out how to. Can someone please help me or if there is any other better simple way.

CodePudding user response:

You can do:

DF1[!grepl(paste(Excluded, collapse = "|"), DF1$Sample_ID),]
#>   Sample_ID Sex
#> 2  a2_03_03   M
#> 4  a4_09_09   F
#> 6  a6_21_21  NM

This works by creating a regex that looks for any of the strings in Example and excludes them with logical negation and subsetting.

CodePudding user response:

Another possible solution, tidyverse-based:

library(tidyverse)

Sample_ID <- c("a1_01_01","a2_03_03","a3_07_07","a4_09_09","a5_10_10","a6_21_21")
Sex<-c("M", "M", "F", "F", "M", "M")
DF1<-data.frame(Sample_ID,Sex)

Excluded <-c("a1_01", "a3_07", "a5_10")

DF1 %>% 
  filter(!str_remove(Sample_ID, "_\\d{2}$") %in% Excluded)

#>   Sample_ID Sex
#> 1  a2_03_03   M
#> 2  a4_09_09   F
#> 3  a6_21_21   M
  • Related