Home > OS >  grep for names with variable number of forward slashes
grep for names with variable number of forward slashes

Time:11-29

I have a column with two types of names (embedded within a longer string).

names are like A/HK/RATATA/Lol(2007) or A/chickapig/RATATA/Lol(2003).

I would like to filter using a regular expression based on the number of "/" within each name.

Example: 
Influenza A virus (A/chicken/Wenzhou/642/2013(H9N2))
Influenza A virus (A/chicken/Wenzhou/643/2013(H9N2))
Influenza A virus (A/chicken/Wenzhou/644/2013(H9N2))
Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))

I would only like to filter the row containing Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))

I tried using \ to scape /, not even sure if it makes sense.

CodePudding user response:

If it is based on the count of /, use str_count to filter the rows

library(dplyr)
n <- 3
df %>%
   filter(str_count(col1, fixed("/")) ==  n)

-output

                                           col1
1 Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))

data

df <- structure(list(col1 = c("Influenza A virus (A/chicken/Wenzhou/642/2013(H9N2))", 
"Influenza A virus (A/chicken/Wenzhou/643/2013(H9N2))", "Influenza A virus (A/chicken/Wenzhou/644/2013(H9N2))", 
"Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))")),
 class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response:

Similar to @akrun's solution we could do it with nchar in combination with gsub:

library(dplyr)
library(tibble)

# example tibble
df <- tibble(x = c("Influenza A virus (A/chicken/Wenzhou/642/2013(H9N2))",
             "Influenza A virus (A/chicken/Wenzhou/643/2013(H9N2))",
             "Influenza A virus (A/chicken/Wenzhou/644/2013(H9N2))",
             "Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))"))

df %>% 
  filter(nchar(x) - nchar(gsub('\\/', '', x)) == 3)
  x                                               
  <chr>                                           
1 Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))
  • Related