Home > database >  sum vars if at least one non-missing
sum vars if at least one non-missing

Time:12-05

I am trying to create an indicator "newvar" to see if at least one of three binary variables take on a value of 1. I want to the variable to be missing only if all three variables are missing rather than at least one is non missing. Are these approaches correct?

df<- df %>% rowwise() %>% mutate(newvar = sum(var1, var2,var3), na.rm=T)

df$newvar <- as.integer(var1|var2|var3, na.rm=TRUE)

CodePudding user response:

Maybe this is what you want.

df$newvar <- as.numeric( apply( df, 1, function(x){ 
    y <- any( x==1 ) & !anyNA( any( x==1 ) );
    if( all( is.na(x) ) ) y[ all(is.na(x)) ] <- NA; y } ) )

df
    a  b  c newvar
1   1  1  1      1
2   0 NA  0      0
3   1  1 NA      1
4   0  0  0      0
5   1  1  1      1
6   1  1  1      1
7  NA NA NA     NA
8   0  0  0      0
9   1  0  1      1
10  1  1  0      1

or with sum

df$newvar <- as.numeric( apply( df, 1, function(x){ 
    y <- sum(x, na.rm=T) > 0;
    if(all(is.na(x))) y[ all(is.na(x)) ] <- NA; y } ) )

df
    a  b  c newvar
1   1  1  1      1
2   0 NA  0      0
3   1  1 NA      1
4   0  0  0      0
5   1  1  1      1
6   1  1  1      1
7  NA NA NA     NA
8   0  0  0      0
9   1  0  1      1
10  1  1  0      1

Data

df <- structure(list(a = c(1L, 0L, 1L, 0L, 1L, 1L, NA, 0L, 1L, 1L), 
    b = c(1L, NA, 1L, 0L, 1L, 1L, NA, 0L, 0L, 1L), c = c(1L, 
    0L, NA, 0L, 1L, 1L, NA, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
-10L))

CodePudding user response:

We can use dplyr::rowwise combined with dplyr::c_across to reduce the code especially if there are more than 3 variables.

library(tidyverse)

set.seed(4)

df <- rerun(3, sample(c(0, 1, NA), 10, replace = TRUE)) %>% as_tibble(.name_repair = "unique")
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3

df %>%
  rowwise() %>%
  mutate(newvar = c_across(starts_with(".")) %>%
    {
      case_when(
        all(is.na(.)) ~ NA,
        any(. == 1) ~ TRUE,
        any(. != 1) & any(. == 0) ~ FALSE
      )
    } %>%
    as.numeric())
#> # A tibble: 10 × 4
#> # Rowwise: 
#>     ...1  ...2  ...3 newvar
#>    <dbl> <dbl> <dbl>  <dbl>
#>  1    NA     1     0      1
#>  2    NA     1     1      1
#>  3    NA     1    NA      1
#>  4    NA     0     0      0
#>  5    NA    NA    NA     NA
#>  6     1    NA     0      1
#>  7     0    NA     1      1
#>  8     1    NA     0      1
#>  9    NA     0     0      0
#> 10     1     0     1      1

Created on 2021-12-04 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related