NA values are not getting replaced in a function but works when called outside a function in R-CodePudding

so I have this tibble main_df which has some columns like "Rainfall_(mm)", "Speed_of_maximum_wind_gust_(km/h)", "9am_Temperature", "9am_relative_humidity_(%)", "9am_cloud_amount_(oktas)"... etc. I tried to identify the numeric columns with this code col_type_vector <- sapply(main_df, typeof) and for all numeric columns I want to replace the "NA" values with the median value of that column. note that I start from 3 because I don't want the first 2 columns. the loop and the function is given below:

set_na_to_median <- function(data_frame, column_name) {
  median_value <- median(data_frame[[column_name]], na.rm = TRUE)
  na_indices <- which(is.na(data_frame[column_name]))
  data_frame[na_indices, column_name] <- median_value
}

col_type_vector <- sapply(main_df, typeof)
for (item in names(col_type_vector)[3:length(names(col_type_vector))]) {
  if (col_type_vector[item] == "integer" | col_type_vector[item] == "double" | col_type_vector[item] == "numeric") {
    set_na_to_median(main_df, item)
  }
}

but when I do it the NA values do not get replaced. If I run the same code outside the function and loops manually it works perfectly. I have basically wasted my whole day on this? what am I doing wrong? Thanks in advance.

CodePudding user response：

You need to assign your NA-replaced tibble somewhere. Try replacing

set_na_to_median(main_df, item)

with

main_df <- set_na_to_median(main_df, item)

CodePudding user response：

To know the type of column you should use the function class instead of typeof. (col_type_vector <- sapply(main_df, class))

However, I think there is an easier process to do this.

Using dplyr -

library(dplyr)

main_df <- main_df %>%
              mutate(across(where(is.numeric), 
                     ~replace(., is.na(.), median(., na.rm = TRUE))))

main_df

You may also use base R to do this -

main_df[] <- lapply(main_df, function(x) 
  if(is.numeric(x)) replace(x, is.na(x), median(x, na.rm = TRUE)) else x)