I am trying to create an indicator "newvar" to see if at least one of three binary variables take on a value of 1. I want to the variable to be missing only if all three variables are missing rather than at least one is non missing. Are these approaches correct?
df<- df %>% rowwise() %>% mutate(newvar = sum(var1, var2,var3), na.rm=T)
df$newvar <- as.integer(var1|var2|var3, na.rm=TRUE)
CodePudding user response:
Maybe this is what you want.
df$newvar <- as.numeric( apply( df, 1, function(x){
y <- any( x==1 ) & !anyNA( any( x==1 ) );
if( all( is.na(x) ) ) y[ all(is.na(x)) ] <- NA; y } ) )
df
a b c newvar
1 1 1 1 1
2 0 NA 0 0
3 1 1 NA 1
4 0 0 0 0
5 1 1 1 1
6 1 1 1 1
7 NA NA NA NA
8 0 0 0 0
9 1 0 1 1
10 1 1 0 1
or with sum
df$newvar <- as.numeric( apply( df, 1, function(x){
y <- sum(x, na.rm=T) > 0;
if(all(is.na(x))) y[ all(is.na(x)) ] <- NA; y } ) )
df
a b c newvar
1 1 1 1 1
2 0 NA 0 0
3 1 1 NA 1
4 0 0 0 0
5 1 1 1 1
6 1 1 1 1
7 NA NA NA NA
8 0 0 0 0
9 1 0 1 1
10 1 1 0 1
Data
df <- structure(list(a = c(1L, 0L, 1L, 0L, 1L, 1L, NA, 0L, 1L, 1L),
b = c(1L, NA, 1L, 0L, 1L, 1L, NA, 0L, 0L, 1L), c = c(1L,
0L, NA, 0L, 1L, 1L, NA, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-10L))
CodePudding user response:
We can use dplyr::rowwise
combined with dplyr::c_across
to reduce the code especially if there are more than 3 variables.
library(tidyverse)
set.seed(4)
df <- rerun(3, sample(c(0, 1, NA), 10, replace = TRUE)) %>% as_tibble(.name_repair = "unique")
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
df %>%
rowwise() %>%
mutate(newvar = c_across(starts_with(".")) %>%
{
case_when(
all(is.na(.)) ~ NA,
any(. == 1) ~ TRUE,
any(. != 1) & any(. == 0) ~ FALSE
)
} %>%
as.numeric())
#> # A tibble: 10 × 4
#> # Rowwise:
#> ...1 ...2 ...3 newvar
#> <dbl> <dbl> <dbl> <dbl>
#> 1 NA 1 0 1
#> 2 NA 1 1 1
#> 3 NA 1 NA 1
#> 4 NA 0 0 0
#> 5 NA NA NA NA
#> 6 1 NA 0 1
#> 7 0 NA 1 1
#> 8 1 NA 0 1
#> 9 NA 0 0 0
#> 10 1 0 1 1
Created on 2021-12-04 by the reprex package (v2.0.1)