I need to fill the null values of all the numerical columns with each column's median value in a data frame. I did the following code.
median_forNumericalNulls <- function(dataframe){
nums <- unlist(lapply(dataframe, is.numeric))
df_num <- dataframe[ , nums]
df_num[] <- lapply(df_num, function(x) {
x[is.na(x)] <- median(x, na.rm = TRUE)
x
})
return(dataframe)
}
median_forNumericalNulls(A)
A is the parent table, which consists of both numerical as well as categorical variables. How can I replace the columns of 'A' dataframe with the output of the function median_forNumericalNulls?
Is there a better way that we can do the same?
CodePudding user response:
May be we need to change the function to directly subset the columns and updating the columns, instead of creating another object and then updating
median_forNumericalNulls <- function(dataframe){
nums <- unlist(lapply(dataframe, is.numeric))
df_num <- dataframe[ , nums]
dataframe[nums] <- lapply(dataframe[nums], function(x) {
x[is.na(x)] <- median(x, na.rm = TRUE)
x
})
dataframe
}
-testing
A <- median_forNumericalNulls(A)
Also, this can be done in a compact way with na.aggregate
though
library(zoo)
A <- na.aggregate(A, FUN = median)
Or using tidyverse
library(dplyr)
A <- A %>%
mutate(across(where(is.numeric),
~ replace(., is.na(.), median(., na.rm = TRUE))))
CodePudding user response:
Here is another approach how you could do it: Example:
librara(dplyr)
iris1 <- iris %>%
select(1, 2, 5)
head(iris1, 10) %>%
as_tibble() %>%
mutate(across(where(is.numeric), ~ifelse(.<= 3, NA, .))) %>%
mutate(across(where(is.numeric), ~ifelse(is.na(.), median(.,na.rm = TRUE), .)))
Sepal.Length Sepal.Width Species
<dbl> <dbl> <fct>
1 5.1 3.5 setosa
2 4.9 3.4 setosa
3 4.7 3.2 setosa
4 4.6 3.1 setosa
5 5 3.6 setosa
6 5.4 3.9 setosa
7 4.6 3.4 setosa
8 5 3.4 setosa
9 4.4 3.4 setosa
10 4.9 3.1 setosa