I have the following data structure, where I want to interpolate the data row-wise until a certain year:
require('data.table')
test_dt <- data.table(iso1 = c('BTN', 'IND', 'BGD'),
iso2 = c('AFG', 'AFG', 'AFG'),
year = c(2006, 2003, 2006))
I came up with the following function that works well for the single-row case, but does not work for the general case:
interpolate_rows <- function(dt, stop_year = 2008) {
year <- as.integer(dt[, .SD, .SDcols = 'year'])
# If year is less than stop year, fill in observations:
if (year < stop_year) {
time_delta <- seq(year, stop_year)
# Explode bilateral country observation:
dt <- dt[rep(dt[, .I], length(time_delta))]
# Replace year column w/ time_delta sequence:
dt <- dt[, year := time_delta]
}
return(dt)
}
## Output
bar <- interpolate_rows(test_dt[1])
bar
iso1 iso2 year
1: BTN AFG 2006
2: BTN AFG 2007
3: BTN AFG 2008
What I'd like to have is the following:
bar <- interpolate_rows(test_dt)
bar
iso1 iso2 year
1: BTN AFG 2006
2: BTN AFG 2007
3: BTN AFG 2008
6: IND AFG 2003
7: IND AFG 2004
8: IND AFG 2005
9: IND AFG 2006
10: IND AFG 2007
11: IND AFG 2008
14: BGD AFG 2006
14: BGD AFG 2007
14: BGD AFG 2008
I know the culprit is most likely this line
year <- as.integer(dt[, .SD, .SDcols = 'year'])
, but I got no clue how to substitute this for a working vector solution. I tried to nest an lapply()
function within interpolate_rows()
to extract the year for each unique group and played around with Map()
, but none of these yielded working solutions.
Any help pointing me to feasible vector solution, would be greatly appreciated.
CodePudding user response:
What about simply using by
:
test_dt[, .(year = min(year):stop_year), by = .(iso1, iso2)]
# iso1 iso2 year
# 1: BTN AFG 2006
# 2: BTN AFG 2007
# 3: BTN AFG 2008
# 4: IND AFG 2003
# 5: IND AFG 2004
# 6: IND AFG 2005
# 7: IND AFG 2006
# 8: IND AFG 2007
# 9: IND AFG 2008
# 10: BGD AFG 2006
# 11: BGD AFG 2007
# 12: BGD AFG 2008
CodePudding user response:
One way using dplyr
and tidyr
libraries.
library(dplyr)
library(tidyr)
interpolate_rows <- function(dt, stop_year = 2008) {
dt %>%
group_by(iso1, iso2) %>%
complete(year = year : stop_year) %>%
ungroup
}
interpolate_rows(test_dt)
# iso1 iso2 year
# <chr> <chr> <dbl>
# 1 BGD AFG 2006
# 2 BGD AFG 2007
# 3 BGD AFG 2008
# 4 BTN AFG 2006
# 5 BTN AFG 2007
# 6 BTN AFG 2008
# 7 IND AFG 2003
# 8 IND AFG 2004
# 9 IND AFG 2005
#10 IND AFG 2006
#11 IND AFG 2007
#12 IND AFG 2008
Another way -
library(data.table)
interpolate_rows <- function(dt, stop_year = 2008) {
vals <- seq(dt$year, stop_year)
dt[rep(1, length(vals))][, year := vals]
}
rbindlist(by(test_dt, seq(nrow(test_dt)), interpolate_rows))