I have a dataset that has NA values. I'm filtering by using grepl by passing it search strings, and have been hoping to use "*" to return all values.
df <- structure(list(`Subject description` = c("Art & Design", "Chinese",
"Classical Greek", "D&T Product Design", "Drama & Theatre Studies"
), `Discount code` = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
search <- "*"
df %>% filter(grepl(search, `Discount code`))
the above return an empty data frame. Is there a way for grepl to return NA values. I appreciate that I could OR
the filter with is.na(Discount code)
, but my code is using the search string and doesn't want to return na values if another value is provided to string
CodePudding user response:
Would it be ok for you to replace NA
s with ""
. Then you could use the search string to return all rows by looking for "*"
:
library(dplyr)
library(tidyr)
df %>%
replace_na(list("Discount code" = "")) %>%
filter(grepl("*", `Discount code`))
#> # A tibble: 5 x 2
#> `Subject description` `Discount code`
#> <chr> <chr>
#> 1 Art & Design ""
#> 2 Chinese ""
#> 3 Classical Greek ""
#> 4 D&T Product Design ""
#> 5 Drama & Theatre Studies ""
Created on 2021-12-10 by the reprex package (v2.0.1)
CodePudding user response:
Since grepl return only TRUE or FALSE you can combine is.na and your grepl statement :
search <- "b"
df %>% filter(is.na(`Discount code`) | grepl(search, `Discount code`))
CodePudding user response:
I ended up creating a custom function to do this:
greplna <- function(data, reg="*", var="Discount code"){
if(reg == "*"){
tmp <- grepl("*", as.list(data[var])[[1]]) | is.na(as.list(data[var])[[1]])
}else{
tmp <- grepl(reg, as.list(data[var])[[1]])
}
return(tmp)
}
You can then use this in a dplyr statement:
df %>% filter(greplna(., search, "Discount code"))
but don't use it after a group, as the .
gets the whole dataset, not the grouped datasets