My apologies if this is a duplicate question. Basically I have a dataset with lets say 4 columns. Each have missing data and they are MAR.
col1 col2 col3 col4
7 4 3 7
0 4
7 10 4
5 8 7
1 9 7
8 10 5
1 4 5
3
7 10 7 5
2 2 4
7 2
9 2
6 0 9 9
3 9
6 5
0 7 6
My goal is to create one column Col5
. This column will contain values like this
Step1) Check if Col1 is missing, if not update Col5 with value from Col1
Step2) Check if Col1 is missing, if Col1 is missing then check Col2, if Col1 is missing and Col2 is not missing then update Col5 value with value from Col2
Step3) If Col1 & Col3 are missing and Col3 is not missing, then update col5 with values from col3
Step4) If Col1 & Col2 & Col3 are missing and Col4 is not missing, then update col5 with values from Col4.
Step5) If all 4 columns are missing then Col5 is NA.
Expected final dataset
col1 col2 col3 col4 col5
7 4 3 7 7
0 4 0
7 10 4 7
5 8 7 5
1 9 7 1
8 10 5 8
1 4 5 1
3 3
7 10 7 5 7
2 2 4 2
7 2 7
9 2 9
4 9 4
9 9
6 5 6
0 7 6 0
I can do this using 3 different ifelse statmements, if anybody has an efficient way to accomplish this , that will help. Thanks
CodePudding user response:
We can use the coalesce()
function from the dplyr
package:
library(dplyr)
df$col5 <- coalesce(df$col1, df$col2, df$col3, df$col4)
The above assumes that the "missing" values are NA
.