How to generate new variables from multiline variables in R-CodePudding

This is the data:

a =  c(1,0,0,NA,0,1)
b =  c(0,1,0,NA,NA,0)
c =  c(0,1,0,NA,NA,NA)
cbind(a,b,c) -> df

I want to generate a variable named x. It needs the following requirements:

As long as there is one ‘1’ in the three lines, x is ‘1’; otherwise x is ‘0’.
Only when all three lines are missing and does not have a ‘1’, x is returned as a missing value, NA.

df
      a  b  c      x
[1,]  1  0  0      1
[2,]  0  1  1      1
[3,]  0  0  0      0
[4,] NA NA NA      NA
[5,]  0 NA NA      NA
[6,]  1  0 NA      1

CodePudding user response：

To have vectorized code use logical vectors.

a =  c(1,0,0,NA,0,1)
b =  c(0,1,0,NA,NA,0)
c =  c(0,1,0,NA,NA,NA)
cbind(a,b,c) -> df

ones <- rowSums(df == 1, na.rm = TRUE)
x <- ones > 0
is.na(x) <- rowSums(is.na(df)) > 0 & ones == 0
rm(ones)
cbind(df, x)
#>       a  b  c  x
#> [1,]  1  0  0  1
#> [2,]  0  1  1  1
#> [3,]  0  0  0  0
#> [4,] NA NA NA NA
#> [5,]  0 NA NA NA
#> [6,]  1  0 NA  1

^{Created on 2022-08-21 by the reprex package (v2.0.1)}

CodePudding user response：

We can write a custom function to check each row of the data. Apply the function to each row using apply.

check_row <- function(x) {
  #Return 1 if any value is 1
  if(any(x == 1, na.rm = TRUE)) return(1)
  #return 0 if all the values are 0
  if(all(x %in% 0)) return(0)
  #else return NA
  else NA
}

df <- cbind(df, x = apply(df, 1, check_row))
df

#      a  b  c  x
#[1,]  1  0  0  1
#[2,]  0  1  1  1
#[3,]  0  0  0  0
#[4,] NA NA NA NA
#[5,]  0 NA NA NA
#[6,]  1  0 NA  1

CodePudding user response：

Here is a dplyr option, where we can convert to a dataframe, then use case_when to apply your requirements, then covert back to a matrix.

library(dplyr)

as.data.frame(df) %>%
  mutate(x = rowSums(across(everything()), na.rm = T),
         x = case_when(x >= 1 ~ 1,
                   x == 0 & if_any(everything(), is.na) ~ NA_real_,
                   TRUE ~ 0)) %>%
  as.matrix.data.frame()

Output

      a  b  c  x
[1,]  1  0  0  1
[2,]  0  1  1  1
[3,]  0  0  0  0
[4,] NA NA NA NA
[5,]  0 NA NA NA
[6,]  1  0 NA  1