how to subset a data frame based on specific or similar strings-CodePudding

As an example, I created a data like this

df<- structure(list(Names = c("AA", "ab", "CC", "AVY"), Column1 = c(0L, 
0L, 0L, 1L), Column2 = c(0L, 0L, 0L, 0L), Column3 = c(0L, 0L, 
0L, 1L)), class = "data.frame", row.names = c(NA, -4L))

I want to subset and keep those rows that have my strings in them lets say 3 of them if I want to find the position, I can using the following to find where CC is

which(df=="CC", arr.ind=TRUE)

or if I want to find two of them CC and ab , I can

which(df==c("CC","ab"), arr.ind=TRUE)

but what I want is to subset it in a new data frame like below

Name    Column1 Column2 Column3
ab       0       0       0
CC       0       0       0

I found for one string at the time based on other people solutions but still if I have more than one I dont know how to do it

df[sapply(df, function(x) grepl("CC",x)),]

CodePudding user response：

Use %in% or subset, no need for which or sapply.

df[df$Names %in% c("CC","ab"), ]
#   Names Column1 Column2 Column3
# 2    ab       0       0       0
# 3    CC       0       0       0

subset(df, Names %in% c("CC","ab"))
#   Names Column1 Column2 Column3
# 2    ab       0       0       0
# 3    CC       0       0       0

%in% is a close relative of match.

df$Names %in% c("CC","ab")
# [1] FALSE  TRUE  TRUE FALSE

basically does

!is.na(match(df$Names, c("CC","ab")))
# [1] FALSE  TRUE  TRUE FALSE

CodePudding user response：

Are you looking for this kind of solution:

With str_detect we could define the strings to filter in a pattern with the | (or) operator:

library(dplyr)
libray(stringr)
df %>% 
  filter(str_detect(Names, 'CC|ab'))

  Names Column1 Column2 Column3
1    ab       0       0       0
2    CC       0       0       0

CodePudding user response：

or if you know the names you want ahead of time...

library(dplyr)

my_names=c("ab","CC")

df_subset<-df %>%
  filter(Names %in% my_names)