I have a large dataset with severall missing (NA) values. There are 3 replications for each variety. So if im missing 1 I can average the other 2 and replace the missing one with that. Or if there is only one known value use it in place of others. Im not sure where to begin but i found a simmilar topic, but its for python (understandhttps://stackoverflow.com/questions/29359134/in-pandas-how-can-i-patch-a-dataframe-with-missing-values-with-values-from-anot)
Here is an example data frame:
A <- c(1, 1, 1, 5, 5, 5, 8, 8, 8)
B <- c(10, 20, 50, 100, 100, 50, 200, 300)
C <- c(10, 10, 10, 50, 50, 50, 100)
length(A)<-9; length(B)<-9; length(C)<-9
df <-cbind(A,B,C)
The idea is to fill the NA with average from B cell by the group of 8 found in A cell. And in the C cell the NA should be filled with the only 8 in the A cell. In the real dataset there will be many more groups with NA, i just didnt know how to make a df like that :D
It seems as a simple task I just dont have the coding knowledge to solve this.
Thanks in advance!
CodePudding user response:
I think this should do the trick. Note that I converted your data into a data frame object beforehand.
A <- c(1, 1, 1, 5, 5, 5, 8, 8, 8)
B <- c(10, 20, 50, 100, 100, 50, 200, 300)
C <- c(10, 10, 10, 50, 50, 50, 100)
length(A)<-9; length(B)<-9; length(C)<-9
df <- data.frame(A,B,C)
library(dplyr)
newdf <- df %>% mutate_all( ~ case_when(is.na(.) ~ mean(., na.rm=TRUE),
TRUE ~ .))
The mutate_all
means the action will be done to all the columns. If you only want to do it to some of the columns use mutate_at
instead.
CodePudding user response:
library(tidyverse)
df %>%
group_by(A) %>%
mutate(across(everything(), ~ replace_na(.x, mean(.x, na.rm = TRUE))))
# A tibble: 9 × 3
# Groups: A [3]
A B C
<dbl> <dbl> <dbl>
1 1 10 10
2 1 20 10
3 1 50 10
4 5 100 50
5 5 100 50
6 5 50 50
7 8 200 100
8 8 300 100
9 8 250 100