I have been trying to use dplyr in R to filter a large data frame that has some empty (NA) cells in it. The string I want to use is a vector containing several alphanumeric search terms.
My goal is to create a new data frame or tibble of the rows that contain ANY of the strings in the vector in ANY of the columns of the data frame.
I have tried several things with a data frame I cannot share, but I found an answer in another question that almost does what I need, except for using a vector as the search term.
From Filter rows which contain a certain string :
Filtering for rows where any column fulfils a condition
ggplot2::diamonds %>%
filter(if_any(everything(), ~ grepl('V',.))) %>%
head()
#> # A tibble: 6 × 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
#> 2 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
#> 3 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
#> 4 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
#> 5 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
#> 6 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
Instead of V as the search term, what if I wanted to filter for a match to ANY value in a vector?
vector1 <- c("V", "F", "G", "E")
Some things I tried on my own data frame that worked for one value but not when using the vector as a search term:
dfdiamonds <- as.dataframe (ggplot2::diamonds)
`your text`test1 <- dfdiamonds %>%
rowwise() %>%
filter(any(c_across(cols=everything()) %in% c(vector1)
test2<- for(item in vector1) {
dfdiamonds %>%
rowwise() %>%
filter(any(c_across(cols=2) == item))
}
test3 <- filter(dfdiamonds, any(c_across(cols = everything()) %in% c(vector1))
#I tried grep for this one and it gave a result as a value rather than a data frame
matches <- unique (grep(paste(vector1,collapse="|"),
dfdiamonds, value=TRUE))
Anyway, I'm at a loss. Any solution will do!
CodePudding user response:
Here is what you need:
ggplot2::diamonds %>%
filter(if_any(everything(), ~ grepl(paste0(vector1, collapse = "|"),.))) %>%
head()
CodePudding user response:
The simplest solution in this case is probably:
library(tidyverse)
vector1 <- c("V", "F", "G", "E")
diamonds %>%
filter(if_any(everything(), ~ grepl(paste(vector1, collapse = "|"),.))) %>%
head()
#> # A tibble: 6 x 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
#> 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
#> 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
#> 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
#> 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
#> 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
Created on 2023-01-24 with reprex v2.0.2