I have a column with two types of names (embedded within a longer string).
names are like A/HK/RATATA/Lol(2007)
or A/chickapig/RATATA/Lol(2003)
.
I would like to filter using a regular expression based on the number of "/" within each name.
Example:
Influenza A virus (A/chicken/Wenzhou/642/2013(H9N2))
Influenza A virus (A/chicken/Wenzhou/643/2013(H9N2))
Influenza A virus (A/chicken/Wenzhou/644/2013(H9N2))
Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))
I would only like to filter the row containing Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))
I tried using \ to scape /, not even sure if it makes sense.
CodePudding user response:
If it is based on the count of /
, use str_count
to filter
the rows
library(dplyr)
n <- 3
df %>%
filter(str_count(col1, fixed("/")) == n)
-output
col1
1 Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))
data
df <- structure(list(col1 = c("Influenza A virus (A/chicken/Wenzhou/642/2013(H9N2))",
"Influenza A virus (A/chicken/Wenzhou/643/2013(H9N2))", "Influenza A virus (A/chicken/Wenzhou/644/2013(H9N2))",
"Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))")),
class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
Similar to @akrun's solution we could do it with nchar
in combination with gsub
:
library(dplyr)
library(tibble)
# example tibble
df <- tibble(x = c("Influenza A virus (A/chicken/Wenzhou/642/2013(H9N2))",
"Influenza A virus (A/chicken/Wenzhou/643/2013(H9N2))",
"Influenza A virus (A/chicken/Wenzhou/644/2013(H9N2))",
"Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))"))
df %>%
filter(nchar(x) - nchar(gsub('\\/', '', x)) == 3)
x
<chr>
1 Influenza A virus (A/Wenzhou/mamamam/2013(H9N2))