Home > Software engineering >  r replace missing values with a constant and column name follow a common pattern
r replace missing values with a constant and column name follow a common pattern

Time:10-17

My dataset has columns and values like this. The column names all start with a common string, Col_a_**

 ID    Col_a_01    Col_a_02    Col_a_03
 1     12.02       10.89       9.02
 2     10.20       NA          7.56
 3     NA          6.58        3.89
 4     10.89       14.56       11.89
 5     15.47       13.38       16.46

My goal is to replace the missing values with the mode values for that column. I can do this like this below

getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

 df$Col_a_01[is.na(Col_a_01)==TRUE] <- getmode(df$Col_a_01)
 df$Col_a_03[is.na(Col_a_02)==TRUE] <- getmode(df$Col_a_02)
 df$Col_a_03[is.na(Col_a_03)==TRUE] <- getmode(df$Col_a_03)

But this becomes unwieldy if I have 100 columns starting with the similar names ending in 1,2,3..100. I am curious if there is an easier and more elegant way to accomplish this. Thanks in advance.

CodePudding user response:

You can change the NA values with ifelse/replace, to apply a function to multiple columns use across in dplyr.

library(dplyr)
df <- df %>% 
       mutate(across(starts_with('Col_a'), ~replace(., is.na(.), getmode(.))))

In base R , use lapply -

cols <- grep('Col_a', names(df))
df[cols] <- lapply(df[cols], function(x) replace(x, is.na(x), getmode(x)))
  • Related