My dataset has columns and values like this. The column names all start with a common string, Col_a_**
ID Col_a_01 Col_a_02 Col_a_03
1 12.02 10.89 9.02
2 10.20 NA 7.56
3 NA 6.58 3.89
4 10.89 14.56 11.89
5 15.47 13.38 16.46
My goal is to replace the missing values with the mode values for that column. I can do this like this below
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
df$Col_a_01[is.na(Col_a_01)==TRUE] <- getmode(df$Col_a_01)
df$Col_a_03[is.na(Col_a_02)==TRUE] <- getmode(df$Col_a_02)
df$Col_a_03[is.na(Col_a_03)==TRUE] <- getmode(df$Col_a_03)
But this becomes unwieldy if I have 100 columns starting with the similar names ending in 1,2,3..100. I am curious if there is an easier and more elegant way to accomplish this. Thanks in advance.
CodePudding user response:
You can change the NA
values with ifelse
/replace
, to apply a function to multiple columns use across
in dplyr
.
library(dplyr)
df <- df %>%
mutate(across(starts_with('Col_a'), ~replace(., is.na(.), getmode(.))))
In base R , use lapply
-
cols <- grep('Col_a', names(df))
df[cols] <- lapply(df[cols], function(x) replace(x, is.na(x), getmode(x)))