Patching missing dataframe values with-CodePudding

I have a large dataset with severall missing (NA) values. There are 3 replications for each variety. So if im missing 1 I can average the other 2 and replace the missing one with that. Or if there is only one known value use it in place of others. Im not sure where to begin but i found a simmilar topic, but its for python (understandhttps://stackoverflow.com/questions/29359134/in-pandas-how-can-i-patch-a-dataframe-with-missing-values-with-values-from-anot)

Here is an example data frame:

A <- c(1, 1, 1, 5, 5, 5, 8, 8, 8)
B <- c(10, 20, 50, 100, 100, 50, 200, 300)
C <- c(10, 10, 10, 50, 50, 50, 100)
length(A)<-9; length(B)<-9; length(C)<-9
df <-cbind(A,B,C)

The idea is to fill the NA with average from B cell by the group of 8 found in A cell. And in the C cell the NA should be filled with the only 8 in the A cell. In the real dataset there will be many more groups with NA, i just didnt know how to make a df like that :D

It seems as a simple task I just dont have the coding knowledge to solve this.

Thanks in advance!

CodePudding user response：

I think this should do the trick. Note that I converted your data into a data frame object beforehand.


A <- c(1, 1, 1, 5, 5, 5, 8, 8, 8)
B <- c(10, 20, 50, 100, 100, 50, 200, 300)
C <- c(10, 10, 10, 50, 50, 50, 100)
length(A)<-9; length(B)<-9; length(C)<-9

df <- data.frame(A,B,C)

library(dplyr)

newdf <- df %>% mutate_all( ~ case_when(is.na(.) ~ mean(., na.rm=TRUE),
                                     TRUE ~ .))

The mutate_all means the action will be done to all the columns. If you only want to do it to some of the columns use mutate_at instead.

CodePudding user response：

library(tidyverse)

df %>% 
  group_by(A) %>% 
  mutate(across(everything(), ~ replace_na(.x, mean(.x, na.rm = TRUE))))

# A tibble: 9 × 3
# Groups:   A [3]
      A     B     C
  <dbl> <dbl> <dbl>
1     1    10    10
2     1    20    10
3     1    50    10
4     5   100    50
5     5   100    50
6     5    50    50
7     8   200   100
8     8   300   100
9     8   250   100