so I have this tibble main_df
which has some columns like "Rainfall_(mm)", "Speed_of_maximum_wind_gust_(km/h)", "9am_Temperature", "9am_relative_humidity_(%)", "9am_cloud_amount_(oktas)"
... etc. I tried to identify the numeric columns with this code col_type_vector <- sapply(main_df, typeof)
and for all numeric columns I want to replace the "NA" values with the median value of that column. note that I start from 3 because I don't want the first 2 columns.
the loop and the function is given below:
set_na_to_median <- function(data_frame, column_name) {
median_value <- median(data_frame[[column_name]], na.rm = TRUE)
na_indices <- which(is.na(data_frame[column_name]))
data_frame[na_indices, column_name] <- median_value
}
col_type_vector <- sapply(main_df, typeof)
for (item in names(col_type_vector)[3:length(names(col_type_vector))]) {
if (col_type_vector[item] == "integer" | col_type_vector[item] == "double" | col_type_vector[item] == "numeric") {
set_na_to_median(main_df, item)
}
}
but when I do it the NA
values do not get replaced. If I run the same code outside the function and loops manually it works perfectly. I have basically wasted my whole day on this? what am I doing wrong?
Thanks in advance.
CodePudding user response:
You need to assign your NA-replaced tibble somewhere. Try replacing
set_na_to_median(main_df, item)
with
main_df <- set_na_to_median(main_df, item)
CodePudding user response:
To know the type of column you should use the function class
instead of typeof
. (col_type_vector <- sapply(main_df, class)
)
However, I think there is an easier process to do this.
Using dplyr
-
library(dplyr)
main_df <- main_df %>%
mutate(across(where(is.numeric),
~replace(., is.na(.), median(., na.rm = TRUE))))
main_df
You may also use base R to do this -
main_df[] <- lapply(main_df, function(x)
if(is.numeric(x)) replace(x, is.na(x), median(x, na.rm = TRUE)) else x)