This seems to be a fairly simple task, but I couldn't figure it out after studying the documentation of ifelse()
, dplyr::if_else()
and several similar posts on SO about applying ifelse()
to multiple columns in a data frame.
My goal: I have the following data frame with columns of different data types. On each row, I want to reset the values in the first 3 columns to NA, if Column "valid" indicates false.
The problem: I used dplyr::across()
and ifelse()
to change the values as I wanted, but the date Column date
and factor Column team
were coerced to numeric (as shown in the reprex below), which wasn't desirable. I know that dplyr::if_else()
preserves data types, but it doesn't work across columns of different data types, either.
I know tdf[tdf$valid == FALSE, !grepl("valid", names(tdf))] <- NA
could achieve my goal, but I prefer a tidyverse approach, which I could use in my data cleaning pipeline. Many thanks in advance!
library(dplyr)
tdf <- tibble(
date = c(as.Date("2021-12-10"), as.Date("2021-12-11")),
team = factor(1:2, labels = c("T1", "T2")),
score = 3:4,
valid = c(TRUE, FALSE)
)
tdf
#> # A tibble: 2 x 4
#> date team score valid
#> <date> <fct> <int> <lgl>
#> 1 2021-12-10 T1 3 TRUE
#> 2 2021-12-11 T2 4 FALSE
tdf %>% mutate(across(-valid, ~ ifelse(valid, ., NA)))
#> # A tibble: 2 x 4
#> date team score valid
#> <dbl> <int> <int> <lgl>
#> 1 18971 1 3 TRUE
#> 2 NA NA NA FALSE
Created on 2021-12-10 by the reprex package (v2.0.1)
CodePudding user response:
Make use of the default (TRUE
) option in case_when
which returns the NA
based on the type
library(dplyr)
tdf %>%
mutate(across(-valid, ~ case_when(valid ~ .)))
-output
# A tibble: 2 × 4
date team score valid
<date> <fct> <int> <lgl>
1 2021-12-10 T1 3 TRUE
2 NA <NA> NA FALSE
Or another option is replace
tdf %>%
mutate(across(-valid, ~ replace(., !valid, NA)))
# A tibble: 2 × 4
date team score valid
<date> <fct> <int> <lgl>
1 2021-12-10 T1 3 TRUE
2 NA <NA> NA FALSE
According to ?ifelse
The mode of the result may depend on the value of test (see the examples), and the class attribute (see oldClass) of the result is taken from test and may be inappropriate for the values selected from yes and no.
Sometimes it is better to use a construction such as
(tmp <- yes; tmp[!test] <- no[!test]; tmp)
, possibly extended to handle missing values in test.
CodePudding user response:
This is a not solved issue:
@akrun gives a very good explanation and how you could solve your specific problem!
But in case if you want to keep ifelse
:
The only working solution (with dates and factors) in your specific case is provided by Fabian Werner in 2015 using a custom safe.ifelse
How to prevent ifelse() from turning Date objects into numeric objects
safe.ifelse <- function(cond, yes, no) {
class.y <- class(yes)
if (class.y == "factor") {
levels.y = levels(yes)
}
X <- ifelse(cond,yes,no)
if (class.y == "factor") {
X = as.factor(X)
levels(X) = levels.y
} else {
class(X) <- class.y
}
return(X)
}
tdf %>% mutate(across(-valid, ~ safe.ifelse(valid, ., NA)))
date team score valid
<date> <fct> <int> <lgl>
1 2021-12-10 T1 3 TRUE
2 NA NA NA FALSE