Home > Software engineering >  using ifelse() with dplyr::if_any(), the values for the case of FALSE returns NA
using ifelse() with dplyr::if_any(), the values for the case of FALSE returns NA

Time:10-12

I'm trying to create a flag column based on other columns in a data frame.

example:

df <- tribble(
  ~x1, ~x2, ~x3, ~x4,
  1, 0, 1, 1,
  0, 0, NA, NA,
  1, 0, NA, 1,
  0, 0, NA, NA,
  0, 1, NA, 0
)

I want to create a flag column such that if the value 1 is present in any of the columns x1 ~ x4, then the value for the flag will be 1 and 0 otherwise.

res <- df |> mutate(flag = ifelse(if_any(x1:x4, function(x) x == 1), 1, 0))

I've tried using dplyr::if_any() with ifelse(), it seems to work for the most part, but for some reason it returns NA in the case of false.

> res
# A tibble: 5 × 5
     x1    x2    x3    x4  flag
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     0     1     1     1
2     0     0    NA    NA    NA
3     1     0    NA     1     1
4     0     0    NA    NA    NA
5     0     1    NA     0     1

why is this happening? What would be a better solution to this?

edit: I tried to see what the if_any() function itself is returning and it seems like it returns NA instead of false.

> res
# A tibble: 5 × 6
     x1    x2    x3    x4  flag true_flase
  <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>     
1     1     0     1     1     1 TRUE      
2     0     0    NA    NA    NA NA        
3     1     0    NA     1     1 TRUE      
4     0     0    NA    NA    NA NA        
5     0     1    NA     0     1 TRUE      

CodePudding user response:

per https://stackoverflow.com/a/44411169/10276092

You can use %in% instead of == to sort-of ignore NAs.

df %>%  mutate(flag = ifelse(if_any(.cols=x1:x4, .fns= ~ . %in% 1), 1, 0))

CodePudding user response:

Here is one way we could do it:

library(dplyr)
library(tidyr)

df %>% 
  rowwise %>%
  mutate(flag = any(cur_data() == 1),
         flag = replace_na(flag, 0))
 x1    x2    x3    x4 flag 
  <dbl> <dbl> <dbl> <dbl> <lgl>
1     1     0     1     1 TRUE 
2     0     0    NA    NA FALSE
3     1     0    NA     1 TRUE 
4     0     0    NA    NA FALSE
5     0     1    NA     0 TRUE 

CodePudding user response:

Or just change NA's to 0

df %>% mutate_each(funs(replace(., which(is.na(.)), 0))) %>% mutate(flag = ifelse(if_any(x1:x4, function(x) x == 1), 1, 0))

Output:

     x1    x2    x3    x4  flag
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     0     1     1     1
2     0     0     0     0     0
3     1     0     0     1     1
4     0     0     0     0     0
5     0     1     0     0     1

CodePudding user response:

Another option using rowSums

df %>% mutate(flag =  (rowSums(., na.rm = TRUE) > 0))

#----
# A tibble: 5 x 5
     x1    x2    x3    x4  flag
  <dbl> <dbl> <dbl> <dbl> <int>
1     1     0     1     1     1
2     0     0    NA    NA     0
3     1     0    NA     1     1
4     0     0    NA    NA     0
5     0     1    NA     0     1

CodePudding user response:

From R manual pages

Note:

Do not use ‘==’ and ‘!=’ for tests, such as in ‘if’ expressions, where you must get a single ‘TRUE’ or ‘FALSE’. Unless you are absolutely sure that nothing unusual can happen, you should use the ‘identical’ function instead.

Following the advice

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(flag = if_any(starts_with("x"), ~ identical(.x, 1)) * 1 )
# A tibble: 5 × 5
# Rowwise: 
     x1    x2    x3    x4  flag
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     0     1     1     1
2     0     0    NA    NA     0
3     1     0    NA     1     1
4     0     0    NA    NA     0
5     0     1    NA     0     1
  • Related