Home > Enterprise >  How to replace certain strings with NA, na_if, if_else, regex
How to replace certain strings with NA, na_if, if_else, regex

Time:07-16

I have a character variable that has some values that I want replaced by NA (e.g. "N/A"; "NA" entered as text, not R's NA type; other text responses.) The values I don't want replaced by NA contain number strings, so I tried using a regular expression to select the non-number strings to replace with NA.

I'm able to filter for the non-number cases using the following, or the number string cases if I remove the "!". I'm been unable to figure out how to use mutate() with if_else() and str_detect() or na_if() with str_detect() to replace these cases. I've only been able to replace cases if I specify them exactly with na_if().

library(dplyr)
library(stringr)

df <- data.frame(var1 = c("84950", "NA", "N/A", "84596/03456", "55555", NA), 
                 var2 = rep("10000", 6))

df %>% 
  filter(!str_detect(var1, "[:digit:]"))

This doesn't work.

df %>% 
mutate(var1 = if_else(str_detect(var1, "[:digit:]"), var1, NA))

This doesn't work, leaves all the cases as is.

df %>% 
  mutate(var1 = na_if(var1, !str_detect(var1, "[:digit:]"))) 

This works to replace this particular value "N/A"

df %>% 
  mutate(var1 = na_if(var1, "N/A"))

CodePudding user response:

Your second approach is close. You would need to specify that NA is a character type.

df |>
  mutate(var1 = if_else(str_detect(var1, "[:digit:]"), var1, NA_character_))

Output:

         var1  var2
1       84950 10000
2        <NA> 10000
3        <NA> 10000
4 84596/03456 10000
5       55555 10000
6        <NA> 10000

CodePudding user response:

Here is an alternative approache using %in% operator:

library(dplyr)
df %>% 
  mutate(var1 = ifelse(var1 %in% c("N/A", "NA"), NA_character_, var1))
         var1  var2
1       84950 10000
2        <NA> 10000
3        <NA> 10000
4 84596/03456 10000
5       55555 10000
6        <NA> 10000

CodePudding user response:

Another option using replace like this:

library(dplyr)

df <- data.frame(var1 = c("84950", "NA", "N/A", "84596/03456", "55555", NA), 
                 var2 = rep("10000", 6))

df %>%
  mutate(across(var1, ~ replace(., . %in% c("N/A", "NA"), NA)))
#>          var1  var2
#> 1       84950 10000
#> 2        <NA> 10000
#> 3        <NA> 10000
#> 4 84596/03456 10000
#> 5       55555 10000
#> 6        <NA> 10000

Created on 2022-07-15 by the reprex package (v2.0.1)

  • Related