I have a dataset with three periods for each ID. I this way, I am trying to fill NA
values in my dataset with the mean if there are 2/3 values without NA
(the mean of the existing values by id). On the other hand, If there is just 1/3 value without NA
, I want to fill the rest with that value. My dataset looks like this:
ID value period
1203 -0.64 1
1203 2
1203 -0.12 3
1218 0.495 1
1218 2
1218 3
So, I want to fill the NA
value in the ID
1203
with the mean of -0.64
and -0.12
, and for the ID
1218
I want to fil with 0.495
.
CodePudding user response:
If you are comfortable with the tidyverse
approach, you just need to use the replace_na()
function in mutate()
.
This will replace NA
in the value
column with mean of value
.
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(value = replace_na(value, mean(value, na.rm = T)))
# A tibble: 6 x 3
# Groups: ID [2]
ID value period
<chr> <dbl> <chr>
1 1203 -0.64 1
2 1203 -0.38 2
3 1203 -0.12 3
4 1218 0.495 1
5 1218 0.495 2
6 1218 0.495 3
CodePudding user response:
The nice thing here is that regardless of how many NAs there are, we can just put in the mean, because the mean of a single number is just itself. The code below creates a table similar to yours, using NA instead of the blank spaces
library(tidyr)
ID=rep(1:10,each=3)
value=runif(30)
value[sample(1:30,10)]=NA
period=rep(1:3,10)
data=data.frame(ID,value,period)
We need tidyr because this is going to be easier if we convert your data to a "wide" format. Once we do that, we can apply mean to each row of the new table, for the columns that are the values (this is why we select columns 2:4, to exclude the id column).
wideData=pivot_wider(data,id_cols=ID,names_from=period,values_from=value)
wideData[,2:4]=apply(wideData[,2:4],1,mean,na.rm=T)
If you want your data back in the original format, just run the following line to "lengthen" it back out.
dataLong=pivot_longer(wideData,c(2:4),names_to='period')