Home > Software design >  using grepl * to return up NA values in R dplyr
using grepl * to return up NA values in R dplyr

Time:12-10

I have a dataset that has NA values. I'm filtering by using grepl by passing it search strings, and have been hoping to use "*" to return all values.

df <- structure(list(`Subject description` = c("Art & Design", "Chinese", 
"Classical Greek", "D&T Product Design", "Drama & Theatre Studies"
), `Discount code` = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

search <- "*"

df %>% filter(grepl(search, `Discount code`))

the above return an empty data frame. Is there a way for grepl to return NA values. I appreciate that I could OR the filter with is.na(Discount code), but my code is using the search string and doesn't want to return na values if another value is provided to string

CodePudding user response:

Would it be ok for you to replace NAs with "". Then you could use the search string to return all rows by looking for "*":

library(dplyr)
library(tidyr)

df %>%
  replace_na(list("Discount code" = "")) %>%  
    filter(grepl("*", `Discount code`))

#> # A tibble: 5 x 2
#>   `Subject description`   `Discount code`
#>   <chr>                   <chr>          
#> 1 Art & Design            ""             
#> 2 Chinese                 ""             
#> 3 Classical Greek         ""             
#> 4 D&T Product Design      ""             
#> 5 Drama & Theatre Studies ""

Created on 2021-12-10 by the reprex package (v2.0.1)

CodePudding user response:

Since grepl return only TRUE or FALSE you can combine is.na and your grepl statement :

search <- "b"

df %>% filter(is.na(`Discount code`) | grepl(search, `Discount code`))

CodePudding user response:

I ended up creating a custom function to do this:

greplna <- function(data, reg="*", var="Discount code"){
  if(reg == "*"){
    tmp <- grepl("*", as.list(data[var])[[1]]) | is.na(as.list(data[var])[[1]])
  }else{
    tmp <- grepl(reg, as.list(data[var])[[1]])
  }
  return(tmp)
}

You can then use this in a dplyr statement:

df %>% filter(greplna(., search, "Discount code"))

but don't use it after a group, as the . gets the whole dataset, not the grouped datasets

  • Related