Home > database >  R data frame - fill missing values with condition on another column
R data frame - fill missing values with condition on another column

Time:12-15

In R, I have a the following data frame:

Id Year Age
1 2000 25
1 2001 NA
1 2002 NA
2 2000 NA
2 2001 30
2 2002 NA

Each Id has at least one row with age filled. I would like to fill the missing "Age" values with the correct age for each ID.

Expected result:

Id Year Age
1 2000 25
1 2001 25
1 2002 25
2 2000 30
2 2001 30
2 2002 30

I've tried using 'fill':

df %>% fill(age)

But not getting the expected results. Is there a simple way to do this?

CodePudding user response:

Assuming this is your dataframe

df<-data.frame(id=c(1,1,1,2,2,2),year=c(2000,2001,2002,2000,2001,2002),age=c(25,NA,NA,NA,30,NA))

With the zoo package, you can try

library(zoo)
df<-df[order(df$id,df$age),]
df$age<-na.locf(df$age)

CodePudding user response:

Please see the solution below with the tidyverse library.

library(tidyverse)    
dt  <- data.frame(Id = rep(1:2, each = 3),
                  Year = rep(2000:2002, each = 2),
                  Age = c(25,NA,NA,30,NA,NA))

dt %>% group_by(Id) %>% arrange(Id,Age) %>% fill(Age)

In the code you provided, you didn't use group_by. It is also important to arrange by Id and Age, because the function fill only fills the column down. See for example that data frame, and compare the option with and without arrange:

dt  <- data.frame(Id = rep(1:2, each = 3),
                  Year = rep(2000:2002, each = 2),
                  Age = c(NA, 25,NA,NA,30,NA))

dt %>% group_by(Id) %>% fill(Age) # only fills partially
dt %>% group_by(Id) %>% arrange(Id,Age) %>% fill(Age) # does the right job

CodePudding user response:

The comments were close, you just have to add the .direction

df %>% group_by(Id) %>% fill(Age, .direction="downup")
# A tibble: 6 x 3
# Groups:   Id [2]
     Id  Year   Age
  <int> <int> <int>
1     1  2000    25
2     1  2001    25
3     1  2002    25
4     2  2000    30
5     2  2001    30
6     2  2002    30
  • Related