See the code below
library(dplyr)
df = tibble(
x1 = c(0, 1, 2, 3),
x2 = c(0, NA, 1, NA),
x3 = as.double(NA)
)
df %>%
mutate(x1 = ifelse(x1 == 0, NA, x1)) %>%
mutate(x2 = ifelse(x2 == 0, NA, x2)) %>%
mutate(x3 = ifelse(x3 == 0, NA, x3)) %>%
str()
df %>%
rowwise() %>%
mutate(x1 = ifelse(x1 == 0, NA, x1)) %>%
mutate(x2 = ifelse(x2 == 0, NA, x2)) %>%
mutate(x3 = ifelse(x3 == 0, NA, x3)) %>%
str()
Column x3 is converted to logical which caused an issue in one of my codes recently.
Is this correct or is this a bug?
I cannot get the logic as for columns x1 and x2 this works correctly.
CodePudding user response:
In R, NA
is a length-1 logical vector.
class(NA)
#> [1] "logical"
The equivalent missing value for numeric data is NA_real_
. This is usually overlooked because when NA_real_
is printed, it is printed as NA
:
NA_real_
#> [1] NA
class(NA_real_)
#> [1] "numeric"
When you create a numeric vector with NA
values, the NA
are actually converted to NA_real_
:
dput(c(1, NA)[2])
#> NA_real_
In your example, you have already explicitly converted x3
to double
, so the column is now filled with NA_real
class(as.double(NA))
#> [1] "numeric"
dput(as.double(NA))
#> NA_real_
This is all as expected. But inside ifelse
, the first argument is always logical. In your case, x3
is the equivalent of:
c(NA_real_, NA_real_, NA_real_, NA_real_)
But the expression c(NA_real_, NA_real_, NA_real_, NA_real) == 0
returns a logical vector, since it is a logical test; you are asking "are these values equal to zero?".
class(c(NA_real_, NA_real_, NA_real_, NA_real_) == 0)
#> [1] "logical"
Inside ifelse
, although there are parameters to specify what to return for TRUE
and FALSE
values of the logical test, there is no value for what to return in the event of NA
, and a logical NA
is returned if you attempt the comparison.
In the case of column x2
, there is one numeric value returned by ifelse
, so the other 3 logical NA
values in that column are converted to NA_real_
.
x2 <- c(0, NA, 2, NA)
ifelse(x2 == 0, NA, x2)
#> [1] NA NA 2 NA
dput(ifelse(x2 == 0, NA, x2)[1])
#> NA_real_
However, in the final column, there are only logical NA
values returned, and nothing in your code to convert them to NA_real_
, so the column remains a logical NA
column.
There are a few possible solutions, but the way to do this in dplyr
is to use if_else
instead of ifelse
, since this does the same thing as ifelse
but preserves type safety. You will also need to specify NA_real_
to keep the type safety:
df %>%
mutate(x1 = if_else(x1 == 0, NA_real_, x1)) %>%
mutate(x2 = if_else(x2 == 0, NA_real_, x2)) %>%
mutate(x3 = if_else(x3 == 0, NA_real_, x3)) %>%
str()
#> tibble [4 x 3] (S3: tbl_df/tbl/data.frame)
#> $ x1: num [1:4] NA 1 2 3
#> $ x2: num [1:4] NA NA 1 NA
#> $ x3: num [1:4] NA NA NA NA
Created on 2022-12-18 with reprex v2.0.2