Home > Back-end >  How to create new variable conditional on missingness on others in dplyr, R?
How to create new variable conditional on missingness on others in dplyr, R?

Time:04-16

Consider these data:

library(dplyr)

d <- tibble(student.status = c(0, 1, NA, 0, 1, 1),
            student.school.hs = c(NA, 1, NA, NA, NA, NA),
            student.school.alths = c(NA, NA, NA, NA, NA, 1),
            student.school.allNA = c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE)) 

  student.status student.school.hs student.school.alt… student.school.…
           <dbl>             <dbl>               <dbl> <lgl>           
1              0                NA                  NA TRUE            
2              1                 1                  NA FALSE           
3             NA                NA                  NA TRUE            
4              0                NA                  NA TRUE            
5              1                NA                  NA TRUE            
6              1                NA                   1 FALSE 
  • I want to assign "0" to student.school.* when student.status == 1 and when all of the student.school.* columns are not NA.

  • If all of the student.school.* colums are NA and student.status == 1, then leave them NA.

  • If student.status == 0 then all the student.school.* columns should stay NA

The final data should look like:

  student.status student.school.hs student.school.alt… student.school.…
           <dbl>             <dbl>               <dbl> <lgl>           
1              0                NA                  NA TRUE            
2              1                 1                   0 FALSE           
3             NA                NA                  NA TRUE            
4              0                NA                  NA TRUE            
5              1                NA                  NA TRUE            
6              1                 0                   1 FALSE     

CodePudding user response:

Perhaps this helps - loop across columns that starts_with the prefix 'student.school' in column name, while remove the logical column from the selection (-where(is.logical) - as student.school.allNA also have the same prefix but different column type), then use case_when to change the value of the columns when it is an NA, and if the student.school.allNA are FALSE (negated (!), along with student.status is 1)

library(dplyr)
d <- d %>%
   mutate(across(c(starts_with('student.school'), - where(is.logical)),
   ~ case_when(student.status %in% 1 & !student.school.allNA & is.na(.x) ~ 0, 
     TRUE ~ .x)))

-output

> d
# A tibble: 6 × 4
  student.status student.school.hs student.school.alths student.school.allNA
           <dbl>             <dbl>                <dbl> <lgl>               
1              0                NA                   NA TRUE                
2              1                 1                    0 FALSE               
3             NA                NA                   NA TRUE                
4              0                NA                   NA TRUE                
5              1                NA                   NA TRUE                
6              1                 0                    1 FALSE       
  • Related