Home > database >  r multiple chained ifelse
r multiple chained ifelse

Time:12-01

My apologies if this is a duplicate question. Basically I have a dataset with lets say 4 columns. Each have missing data and they are MAR.

col1    col2    col3    col4
7       4       3       7
        0               4
7      10       4   
5               8       7
1       9               7
8      10       5   
        1       4       5
3           
7      10       7       5
2       2       4   
        7               2
9               2   
6       0       9       9
        3       9   
6                       5
0       7       6   

My goal is to create one column Col5. This column will contain values like this

Step1) Check if Col1 is missing, if not update Col5 with value from Col1

Step2) Check if Col1 is missing, if Col1 is missing then check Col2, if Col1 is missing and Col2 is not missing then update Col5 value with value from Col2

Step3) If Col1 & Col3 are missing and Col3 is not missing, then update col5 with values from col3

Step4) If Col1 & Col2 & Col3 are missing and Col4 is not missing, then update col5 with values from Col4.

Step5) If all 4 columns are missing then Col5 is NA.

Expected final dataset

col1    col2    col3    col4   col5
7       4       3       7      7
        0               4      0
7      10       4              7
5               8       7      5 
1       9               7      1
8      10       5              8
        1       4       5      1
3                              3
7      10       7       5      7
2       2       4              2
        7               2      7
9               2              9
                4       9      4
                        9      9        
6                       5      6
0       7       6              0

I can do this using 3 different ifelse statmements, if anybody has an efficient way to accomplish this , that will help. Thanks

CodePudding user response:

We can use the coalesce() function from the dplyr package:

library(dplyr)

df$col5 <- coalesce(df$col1, df$col2, df$col3, df$col4)

The above assumes that the "missing" values are NA.

  • Related