I want to filter my data frame based on whether one column contains text that appears in a vector. The string in each cell of the column is quite long, and I only need the vector item to appear within the string.
I can do this for a single reference e.g.
library(dplyr)
starwars %>%
filter(grepl("at", name))
But what if I want to use a vector of references?
attributes = c("at", "oo", "un")
CodePudding user response:
base R
option:
library(dplyr) # for starwars dataset
attributes = c("at", "oo", "un")
starwars[grepl(paste(attributes, collapse="|"), starwars$name),]
#> # A tibble: 15 × 14
#> name height mass hair_…¹ skin_…² eye_c…³ birth…⁴ sex gender homew…⁵
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 Beru White… 165 75 brown light blue 47 fema… femin… Tatooi…
#> 2 Palpatine 170 75 grey pale yellow 82 male mascu… Naboo
#> 3 Nien Nunb 160 68 none grey black NA male mascu… Sullust
#> 4 Nute Gunray 191 90 none mottle… red NA male mascu… Cato N…
#> 5 Roos Tarpa… 224 82 none grey orange NA male mascu… Naboo
#> 6 Watto 137 NA black blue, … yellow NA male mascu… Toydar…
#> 7 Bib Fortuna 180 NA none pale pink NA male mascu… Ryloth
#> 8 Ki-Adi-Mun… 198 82 white pale yellow 92 male mascu… Cerea
#> 9 Yarael Poof 264 NA none white yellow NA male mascu… Quermia
#> 10 Plo Koon 188 80 none orange black 22 male mascu… Dorin
#> 11 Dooku 193 80 white fair brown 102 male mascu… Serenno
#> 12 Taun We 213 NA none grey black NA fema… femin… Kamino
#> 13 Ratts Tyer… 79 15 none grey, … unknown NA male mascu… Aleen …
#> 14 Wat Tambor 193 48 none green,… unknown NA male mascu… Skako
#> 15 Sly Moore 178 48 none pale white NA <NA> <NA> Umbara
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> # starships <list>, and abbreviated variable names ¹hair_color, ²skin_color,
#> # ³eye_color, ⁴birth_year, ⁵homeworld
Created on 2022-09-09 with reprex v2.0.2
CodePudding user response:
Use paste
with collapse = "|"
to detect multiple patterns.
attributes = c("at", "oo", "un")
#paste(attributes, collapse = "|")
#[1] "at|oo|un"
starwars %>%
filter(grepl(paste(attributes, collapse = "|"), name))
Another way, if you don't want to go by paste
:
starwars %>%
filter(sapply(name, \(x) any(sapply(attributes, str_detect, string = x))))