I'm trying to create a flag column based on other columns in a data frame.
example:
df <- tribble(
~x1, ~x2, ~x3, ~x4,
1, 0, 1, 1,
0, 0, NA, NA,
1, 0, NA, 1,
0, 0, NA, NA,
0, 1, NA, 0
)
I want to create a flag column such that if the value 1 is present in any of the columns x1 ~ x4, then the value for the flag will be 1 and 0 otherwise.
res <- df |> mutate(flag = ifelse(if_any(x1:x4, function(x) x == 1), 1, 0))
I've tried using dplyr::if_any()
with ifelse()
, it seems to work for the most part, but for some reason it returns NA
in the case of false.
> res
# A tibble: 5 × 5
x1 x2 x3 x4 flag
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 1 1
2 0 0 NA NA NA
3 1 0 NA 1 1
4 0 0 NA NA NA
5 0 1 NA 0 1
why is this happening? What would be a better solution to this?
edit: I tried to see what the if_any()
function itself is returning and it seems like it returns NA
instead of false.
> res
# A tibble: 5 × 6
x1 x2 x3 x4 flag true_flase
<dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 1 0 1 1 1 TRUE
2 0 0 NA NA NA NA
3 1 0 NA 1 1 TRUE
4 0 0 NA NA NA NA
5 0 1 NA 0 1 TRUE
CodePudding user response:
per https://stackoverflow.com/a/44411169/10276092
You can use %in% instead of == to sort-of ignore NAs.
df %>% mutate(flag = ifelse(if_any(.cols=x1:x4, .fns= ~ . %in% 1), 1, 0))
CodePudding user response:
Here is one way we could do it:
library(dplyr)
library(tidyr)
df %>%
rowwise %>%
mutate(flag = any(cur_data() == 1),
flag = replace_na(flag, 0))
x1 x2 x3 x4 flag
<dbl> <dbl> <dbl> <dbl> <lgl>
1 1 0 1 1 TRUE
2 0 0 NA NA FALSE
3 1 0 NA 1 TRUE
4 0 0 NA NA FALSE
5 0 1 NA 0 TRUE
CodePudding user response:
Or just change NA's to 0
df %>% mutate_each(funs(replace(., which(is.na(.)), 0))) %>% mutate(flag = ifelse(if_any(x1:x4, function(x) x == 1), 1, 0))
Output:
x1 x2 x3 x4 flag
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 1 1
2 0 0 0 0 0
3 1 0 0 1 1
4 0 0 0 0 0
5 0 1 0 0 1
CodePudding user response:
Another option using rowSums
df %>% mutate(flag = (rowSums(., na.rm = TRUE) > 0))
#----
# A tibble: 5 x 5
x1 x2 x3 x4 flag
<dbl> <dbl> <dbl> <dbl> <int>
1 1 0 1 1 1
2 0 0 NA NA 0
3 1 0 NA 1 1
4 0 0 NA NA 0
5 0 1 NA 0 1
CodePudding user response:
From R manual pages
Note:
Do not use ‘==’ and ‘!=’ for tests, such as in ‘if’ expressions, where you must get a single ‘TRUE’ or ‘FALSE’. Unless you are absolutely sure that nothing unusual can happen, you should use the ‘identical’ function instead.
Following the advice
library(dplyr)
df %>%
rowwise() %>%
mutate(flag = if_any(starts_with("x"), ~ identical(.x, 1)) * 1 )
# A tibble: 5 × 5
# Rowwise:
x1 x2 x3 x4 flag
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 1 1
2 0 0 NA NA 0
3 1 0 NA 1 1
4 0 0 NA NA 0
5 0 1 NA 0 1