I have patchy data on property rental prices over the 8 years 2014 to 2021. I want to expand to a full panel so that for each year every property has a value for each of the years. This is done by carrying over the value from one year to another. At the moment each property is an entry in a list of data frames. This R code creates some dummy data (df1, df2, df3) and gives the result (df4) that I am looking for.
year <- c(2014, 2019, 2021)
price <- c(100, 110, 120)
df1 <- data.frame(cbind(id=1, year, price))
year <- c(2016, 2019, 2021)
price <- c(200, 210, 220)
df2 <- data.frame(cbind(id=2, year, price))
year <-c (2014, 2015, 2019, 2020)
price <-c (300, 310, 320, 330)
df3 <- data.frame(cbind(id=3, year, price))
list1 <- list(df1, df2, df3)
id <- c(rep(1,8), rep(2,8), rep(3,8))
year <- c(rep(seq(2014,2021), 3))
price <- c(100, 100, 100, 100, 100, 110, 110, 120,
200, 200, 200, 200, 200, 210, 210, 220,
300, 310, 310, 310, 310, 320, 330, 330)
df4 <- data.frame(id, year, price)
So for property id=1 I have data for 2014, 2019 and 2021. I will forward fill in 2015 to 2018 with the 2014 price. For property id=2 I will back fill in 2014 and 2015 with 2016, and also forward fill from 2017 to 2018 with 2016.
I can start this by rbinding all the data frames in the list and then merging this with a new data frame that has all the id by year combinations. But this will not do the infilling that I want, all the missing id by year combinations will be NA.
In total I have about 100k properties to do this task for. Thanks.
CodePudding user response:
Or with purrr
years <- tibble(year = c(2014:2021))
list1 %>%
map_dfr(~.x %>%
right_join(years, by = "year") %>%
arrange(year) %>%
fill(id, .direction = "down") %>%
fill(id, .direction = "up") %>%
fill(price, .direction = "downup"))
CodePudding user response:
Check out the fill()
function function through tidyverse
.
Using your example, but inducing the NA's you mention, df5
should be what you're looking for here.
library( tidyverse )
year <- c(2014, 2019, 2021)
price <- c(100, 110, 120)
df1 <- data.frame(cbind(id=1, year, price))
year <- c(2016, 2019, 2021)
price <- c(200, 210, 220)
df2 <- data.frame(cbind(id=2, year, price))
year <-c (2014, 2015, 2019, 2020)
price <-c (300, 310, 320, 330)
df3 <- data.frame(cbind(id=3, year, price))
list1 <- list(df1, df2, df3)
id <- c(rep(1,8), rep(2,8), rep(3,8))
year <- c(rep(seq(2014,2021), 3))
price <- c(100, NA, NA, NA, NA, 110, NA, 120,
NA, NA, 200, NA, NA, 210, 210, 220,
300, 310, 310, 310, 310, 320, 330, 330)
df4 <- data.frame(id, year, price)
df5 <- df4 %>% group_by( id ) %>% fill( price, .direction = "downup" )