In R, I have a the following data frame:
Id | Year | Age |
---|---|---|
1 | 2000 | 25 |
1 | 2001 | NA |
1 | 2002 | NA |
2 | 2000 | NA |
2 | 2001 | 30 |
2 | 2002 | NA |
Each Id has at least one row with age filled. I would like to fill the missing "Age" values with the correct age for each ID.
Expected result:
Id | Year | Age |
---|---|---|
1 | 2000 | 25 |
1 | 2001 | 25 |
1 | 2002 | 25 |
2 | 2000 | 30 |
2 | 2001 | 30 |
2 | 2002 | 30 |
I've tried using 'fill':
df %>% fill(age)
But not getting the expected results. Is there a simple way to do this?
CodePudding user response:
Assuming this is your dataframe
df<-data.frame(id=c(1,1,1,2,2,2),year=c(2000,2001,2002,2000,2001,2002),age=c(25,NA,NA,NA,30,NA))
With the zoo
package, you can try
library(zoo)
df<-df[order(df$id,df$age),]
df$age<-na.locf(df$age)
CodePudding user response:
Please see the solution below with the tidyverse
library.
library(tidyverse)
dt <- data.frame(Id = rep(1:2, each = 3),
Year = rep(2000:2002, each = 2),
Age = c(25,NA,NA,30,NA,NA))
dt %>% group_by(Id) %>% arrange(Id,Age) %>% fill(Age)
In the code you provided, you didn't use group_by
. It is also important to arrange
by Id and Age, because the function fill
only fills the column down. See for example that data frame, and compare the option with and without arrange
:
dt <- data.frame(Id = rep(1:2, each = 3),
Year = rep(2000:2002, each = 2),
Age = c(NA, 25,NA,NA,30,NA))
dt %>% group_by(Id) %>% fill(Age) # only fills partially
dt %>% group_by(Id) %>% arrange(Id,Age) %>% fill(Age) # does the right job
CodePudding user response:
The comments were close, you just have to add the .direction
df %>% group_by(Id) %>% fill(Age, .direction="downup")
# A tibble: 6 x 3
# Groups: Id [2]
Id Year Age
<int> <int> <int>
1 1 2000 25
2 1 2001 25
3 1 2002 25
4 2 2000 30
5 2 2001 30
6 2 2002 30