I would like to replace values by NA in specific rows if a specific character is found within the current value, f.e. if a value contains "<" (lower than), f.e. "<7.5" I would like to replace the whole value by NA.
Examples:
Column A: 3, 4, 8, <5.6, 1, 3
Column B: 7, 4, <6, 1, <2.2, 8
should be converted to:
Column A: 3, 4, 8, NA, 1, 3
Column B: 7, 4, NA, 1, NA, 8
I found this example here (https://dplyr.tidyverse.org/reference/na_if.html) with mutate and na_if(), but it requires to match the whole string, f.e.
y <- c("abc", "def", "", "ghi")
na_if(y, "def")
So "def" would be replaced by NA. But if I use
y <- c("abc", "def", "", "ghi")
na_if(y, "ef")
nothing is replaced. There is also an example with
library(dplyr)
data <- starwars
data %>%
select(name, eye_color) %>%
mutate(name = na_if(name, "Luke Skywalker")) %>%
mutate(eye_color = na_if(eye_color, "unknown")) -> dataedited
And this code works perfect for me, but also need exact match instead of just a part of the string. This way I could edit each column manually, maybe there is a way to perform this across multiple columns. I would like to convert values to NA if name contains "sky", or eye contains "unkn".
Can anyone help me?
Thank you!
CodePudding user response:
The na_if
wouldn't take more than one element in y
. We can create a logical vector in replace
to replace the values to NA
. For multiple columns, use across
library(dplyr)
data <- data %>%
mutate(across(c(name, eye_color),
~ replace(., . %in% c("Luke Skywalker", "unknown"), NA)))
For partial match, use a regex
in str_detect
or grepl
library(stringr)
data <- data %>%
mutate(across(c(name, eye_color),
~ replace(., str_detect(., "sky|unkn"), NA)))
CodePudding user response:
I've also found that na_if()
wasn't flexible enough, so I often use my own version na_predicate()
. It's got two arguments: the vector to edit, and a predicate function that returns TRUE
or FALSE
.
For your situation, you can combine it with dplyr's across()
, to edit multiple columns.
library(dplyr)
library(stringr)
na_predicate <- function(x, fn) {
predicate <- rlang::as_function(fn)
x[predicate(x)] <- NA
x
}
# Example of a simple predicate function. By default, it's applied to the vector
# to change
is_even <- function(x) x %% 2 == 0
na_predicate(1:10, is_even)
#> [1] 1 NA 3 NA 5 NA 7 NA 9 NA
# But you can use the formula notation to make it apply to something else
# instead
na_predicate(c("a", "b", "c", "d"), ~ is_even(1:4))
#> [1] "a" NA "c" NA
# Applying it to starwars data. Here's the original:
original_data <- starwars %>%
select(name, eye_color, skin_color) %>%
head() %>%
print()
#> # A tibble: 6 x 3
#> name eye_color skin_color
#> <chr> <chr> <chr>
#> 1 Luke Skywalker blue fair
#> 2 C-3PO yellow gold
#> 3 R2-D2 red white, blue
#> 4 Darth Vader yellow white
#> 5 Leia Organa brown light
#> 6 Owen Lars blue light
# And here I'm using na_predicate() to turn any value in the name/eye_color
# columns that contains an "l" into NA:
original_data %>%
mutate(across(c(name, eye_color),
na_predicate, ~ str_detect(., "l")))
#> # A tibble: 6 x 3
#> name eye_color skin_color
#> <chr> <chr> <chr>
#> 1 <NA> <NA> fair
#> 2 C-3PO <NA> gold
#> 3 R2-D2 red white, blue
#> 4 Darth Vader <NA> white
#> 5 Leia Organa brown light
#> 6 Owen Lars <NA> light
Created on 2021-11-09 by the reprex package (v2.0.1)
CodePudding user response:
Just convert the column to numeric and the components that are not numeric will be converted to NA. This will generate warnings but they can be suppressed.
Alternately in the second approach below check if there are non-digit non-dots and use NA for those and then convert to numeric in which case there will be no warnings in the first place.
The third approach is the same except it assumes that the values to be converted to NA all contain <
.
The fourth approach replaces any component starting with < with just < and then uses na_if
.
x <- c(7, 4, "<6", 1, "<2.2", 8)
# 1
suppressWarnings(as.numeric(x))
## [1] 7 4 NA 1 NA 8
# 2
as.numeric(ifelse(grepl("[^0-9.]", x), NA, x))
## [1] 7 4 NA 1 NA 8
# 3
as.numeric(ifelse(grepl("<", x), NA, x))
## [1] 7 4 NA 1 NA 8
# 4
library(dplyr)
as.numeric(na_if(sub("<.*", "<", x), "<"))
## [1] 7 4 NA 1 NA 8
If we have several values that we wish to map to NA or a regex pattern then use replace like this:
y <- head(letters)
# 5
replace(y, y %in% c("a", "c"), NA)
## [1] NA "b" NA "d" "e" "f"
# 6
replace(y, grepl("a|c", y), NA)
## [1] NA "b" NA "d" "e" "f"