Home > OS >  r impute missing data in two columns
r impute missing data in two columns

Time:10-06

I have a dataset like this.

  ID   Yr    Month
  1    3     NA
  2    4     23
  3    NA    46
  4    1     19
  5    NA    NA

I like to create a new column , Age where

 Case1 : Age = Year,  if Month is missing
 Case2 : Age = Year   Month/12 , if Year and Month are not missing
 Case3 : Age = Month/12 , if Year is missing
 Case4 : Age = NA, if both Year and Month are missing.

The final expected dataset should look like this.

  ID   Yr    Month   Age
  1    3     NA      3
  2    4     23      5.91
  3    NA    46      3.83
  4    1     19      2.58 
  5    NA    NA      NA

I am able to accomplish this with 30 lines of code, but I am looking for a simple and efficient solution to this problem. Any suggestions , much appreciated, thanks in advance.

CodePudding user response:

You may include the conditions in case_when statement.

library(dplyr)

df %>%
  mutate(Age = case_when(is.na(Month) & is.na(Yr) ~ NA_real_, 
                         is.na(Month) ~ as.numeric(Yr), 
                         is.na(Yr) ~ Month/12, 
                         TRUE ~ Yr   Month/12))

#  ID Yr Month      Age
#1  1  3    NA 3.000000
#2  2  4    23 5.916667
#3  3 NA    46 3.833333
#4  4  1    19 2.583333
#5  5 NA    NA       NA
  • Related