Fill NA values by id with mean and existing values-CodePudding

I have a dataset with three periods for each ID. I this way, I am trying to fill NA values in my dataset with the mean if there are 2/3 values without NA (the mean of the existing values by id). On the other hand, If there is just 1/3 value without NA, I want to fill the rest with that value. My dataset looks like this:

  ID value period
1203 -0.64 1
1203       2
1203 -0.12 3
1218 0.495 1
1218       2
1218       3

So, I want to fill the NA value in the ID 1203 with the mean of -0.64 and -0.12, and for the ID 1218 I want to fil with 0.495.

CodePudding user response：

If you are comfortable with the tidyverse approach, you just need to use the replace_na() function in mutate().

This will replace NA in the value column with mean of value.

library(tidyverse)

df %>% 
  group_by(ID) %>% 
  mutate(value = replace_na(value, mean(value, na.rm = T)))

# A tibble: 6 x 3
# Groups:   ID [2]
  ID     value period
  <chr>  <dbl> <chr> 
1 1203  -0.64  1     
2 1203  -0.38  2     
3 1203  -0.12  3     
4 1218   0.495 1     
5 1218   0.495 2     
6 1218   0.495 3

CodePudding user response：

The nice thing here is that regardless of how many NAs there are, we can just put in the mean, because the mean of a single number is just itself. The code below creates a table similar to yours, using NA instead of the blank spaces

library(tidyr)

ID=rep(1:10,each=3)
value=runif(30)
value[sample(1:30,10)]=NA
period=rep(1:3,10)


data=data.frame(ID,value,period)

We need tidyr because this is going to be easier if we convert your data to a "wide" format. Once we do that, we can apply mean to each row of the new table, for the columns that are the values (this is why we select columns 2:4, to exclude the id column).

wideData=pivot_wider(data,id_cols=ID,names_from=period,values_from=value)

wideData[,2:4]=apply(wideData[,2:4],1,mean,na.rm=T)

If you want your data back in the original format, just run the following line to "lengthen" it back out.

dataLong=pivot_longer(wideData,c(2:4),names_to='period')