Home > front end >  How to replace NA's in numerical columns with the median of those columns?
How to replace NA's in numerical columns with the median of those columns?

Time:09-27

I am working on a data frame with multiple data types.I would like to replace NA values only on numerical columns with the median of that particular column.I have seen questions on replacing with mean a lot, but not median. My df is similar to the following code:

my_groups <- c(rep("A", 5), rep("B",5))
my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(3, NA, 4, 8, 2, 11, 15, NA, 9, 10)
my_df <- data.frame(my_groups, my_values_1, my_values_2)
my_df %>% select_if(is.numeric)

This gives me numerical columns, but I cant figure out the next step.

CodePudding user response:

1) Inserting some NA's into the first column of the built-in BOD we have:

library(dplyr)   
BOD$Time[1:2] <- NA

na.median <- function(x) replace(x, is.na(x), median(x, na.rm = TRUE))   
BOD %>% mutate(across(where(is.numeric), na.median))

giving:

  Time demand
1  4.5    8.3
2  4.5   10.3
3  3.0   19.0
4  4.0   16.0
5  5.0   15.6
6  7.0   19.8

2) or using only base R with na.median from above:

ok <- sapply(BOD, is.numeric)
replace(BOD, ok, lapply(BOD[ok], na.median))

CodePudding user response:

We could use mutate with across and an ifelse statement: Note: D. Grothendieck answer works also perfect!

library(dplyr)
my_df %>% 
  mutate(across(where(is.numeric), ~ifelse(is.na(.), median(.,na.rm=TRUE), .)))

output:

   my_groups my_values_1 my_values_2
1          A         4.0         3.0
2          A         9.0         8.5
3          A        10.0         4.0
4          A         8.5         8.0
5          A         5.0         2.0
6          B        12.0        11.0
7          B         8.5        15.0
8          B         7.0         8.5
9          B        11.0         9.0
10         B         8.0        10.0
  • Related