Home > Back-end >  Preserving column types when applying ifelse() across columns of different types in R
Preserving column types when applying ifelse() across columns of different types in R

Time:12-11

This seems to be a fairly simple task, but I couldn't figure it out after studying the documentation of ifelse(), dplyr::if_else() and several similar posts on SO about applying ifelse() to multiple columns in a data frame.

My goal: I have the following data frame with columns of different data types. On each row, I want to reset the values in the first 3 columns to NA, if Column "valid" indicates false.

The problem: I used dplyr::across() and ifelse() to change the values as I wanted, but the date Column date and factor Column team were coerced to numeric (as shown in the reprex below), which wasn't desirable. I know that dplyr::if_else() preserves data types, but it doesn't work across columns of different data types, either.

I know tdf[tdf$valid == FALSE, !grepl("valid", names(tdf))] <- NA could achieve my goal, but I prefer a tidyverse approach, which I could use in my data cleaning pipeline. Many thanks in advance!

library(dplyr)

tdf <- tibble(
  date = c(as.Date("2021-12-10"), as.Date("2021-12-11")),
  team = factor(1:2, labels = c("T1", "T2")),
  score = 3:4,
  valid = c(TRUE, FALSE)
)

tdf
#> # A tibble: 2 x 4
#>   date       team  score valid
#>   <date>     <fct> <int> <lgl>
#> 1 2021-12-10 T1        3 TRUE 
#> 2 2021-12-11 T2        4 FALSE

tdf %>% mutate(across(-valid, ~ ifelse(valid, ., NA)))
#> # A tibble: 2 x 4
#>    date  team score valid
#>   <dbl> <int> <int> <lgl>
#> 1 18971     1     3 TRUE 
#> 2    NA    NA    NA FALSE

Created on 2021-12-10 by the reprex package (v2.0.1)

CodePudding user response:

Make use of the default (TRUE) option in case_when which returns the NA based on the type

library(dplyr)
tdf %>%
    mutate(across(-valid, ~ case_when(valid ~ .)))

-output

# A tibble: 2 × 4
  date       team  score valid
  <date>     <fct> <int> <lgl>
1 2021-12-10 T1        3 TRUE 
2 NA         <NA>     NA FALSE

Or another option is replace

tdf %>% 
   mutate(across(-valid, ~ replace(., !valid, NA)))
# A tibble: 2 × 4
  date       team  score valid
  <date>     <fct> <int> <lgl>
1 2021-12-10 T1        3 TRUE 
2 NA         <NA>     NA FALSE

According to ?ifelse

The mode of the result may depend on the value of test (see the examples), and the class attribute (see oldClass) of the result is taken from test and may be inappropriate for the values selected from yes and no.

Sometimes it is better to use a construction such as

(tmp <- yes; tmp[!test] <- no[!test]; tmp)

, possibly extended to handle missing values in test.

CodePudding user response:

This is a not solved issue:

@akrun gives a very good explanation and how you could solve your specific problem!

But in case if you want to keep ifelse:

The only working solution (with dates and factors) in your specific case is provided by Fabian Werner in 2015 using a custom safe.ifelse

How to prevent ifelse() from turning Date objects into numeric objects

safe.ifelse <- function(cond, yes, no) {
  class.y <- class(yes)
  if (class.y == "factor") {
    levels.y = levels(yes)
  }
  X <- ifelse(cond,yes,no)
  if (class.y == "factor") {
    X = as.factor(X)
    levels(X) = levels.y
  } else {
    class(X) <- class.y
  }
  return(X)
}

tdf %>% mutate(across(-valid, ~ safe.ifelse(valid, ., NA)))

  date       team  score valid
  <date>     <fct> <int> <lgl>
1 2021-12-10 T1        3 TRUE 
2 NA         NA       NA FALSE
  • Related