Home > Software engineering >  Generalise a function to vector format data.table
Generalise a function to vector format data.table

Time:10-19

I have the following data structure, where I want to interpolate the data row-wise until a certain year:

require('data.table')
test_dt <- data.table(iso1 = c('BTN', 'IND', 'BGD'),
 iso2 = c('AFG', 'AFG', 'AFG'),
 year = c(2006, 2003, 2006))

I came up with the following function that works well for the single-row case, but does not work for the general case:

interpolate_rows <- function(dt, stop_year = 2008)  {
  
    year <- as.integer(dt[, .SD, .SDcols = 'year'])
    
    # If year is less than stop year, fill in observations:
    if (year < stop_year) {
      time_delta <- seq(year, stop_year)
            
      # Explode bilateral country observation:
      dt <- dt[rep(dt[, .I], length(time_delta))]
      
      # Replace year column w/ time_delta sequence:
      dt <- dt[, year := time_delta]
      }
    
  return(dt)
}
## Output
bar <- interpolate_rows(test_dt[1])

bar
  iso1  iso2   year
 1:  BTN    AFG    2006
 2:  BTN    AFG    2007
 3:  BTN    AFG    2008

What I'd like to have is the following:

bar <- interpolate_rows(test_dt)

bar
  iso1  iso2   year
 1:  BTN    AFG    2006
 2:  BTN    AFG    2007
 3:  BTN    AFG    2008
 6:  IND    AFG    2003
 7:  IND    AFG    2004
 8:  IND    AFG    2005
 9:  IND    AFG    2006
 10:  IND    AFG    2007
 11:  IND    AFG    2008
 14:  BGD    AFG    2006
 14:  BGD    AFG    2007
 14:  BGD    AFG    2008

I know the culprit is most likely this line year <- as.integer(dt[, .SD, .SDcols = 'year']), but I got no clue how to substitute this for a working vector solution. I tried to nest an lapply() function within interpolate_rows() to extract the year for each unique group and played around with Map(), but none of these yielded working solutions.

Any help pointing me to feasible vector solution, would be greatly appreciated.

CodePudding user response:

What about simply using by:

test_dt[, .(year = min(year):stop_year), by = .(iso1, iso2)]

#     iso1 iso2 year
#  1:  BTN  AFG 2006
#  2:  BTN  AFG 2007
#  3:  BTN  AFG 2008
#  4:  IND  AFG 2003
#  5:  IND  AFG 2004
#  6:  IND  AFG 2005
#  7:  IND  AFG 2006
#  8:  IND  AFG 2007
#  9:  IND  AFG 2008
# 10:  BGD  AFG 2006
# 11:  BGD  AFG 2007
# 12:  BGD  AFG 2008

CodePudding user response:

One way using dplyr and tidyr libraries.

library(dplyr)
library(tidyr)

interpolate_rows <- function(dt, stop_year = 2008)  {
  dt %>%
    group_by(iso1, iso2) %>%
    complete(year = year : stop_year) %>%
    ungroup
}

interpolate_rows(test_dt)

#  iso1  iso2   year
#   <chr> <chr> <dbl>
# 1 BGD   AFG    2006
# 2 BGD   AFG    2007
# 3 BGD   AFG    2008
# 4 BTN   AFG    2006
# 5 BTN   AFG    2007
# 6 BTN   AFG    2008
# 7 IND   AFG    2003
# 8 IND   AFG    2004
# 9 IND   AFG    2005
#10 IND   AFG    2006
#11 IND   AFG    2007
#12 IND   AFG    2008

Another way -

library(data.table)

interpolate_rows <- function(dt, stop_year = 2008)  {
  vals <- seq(dt$year, stop_year)
  dt[rep(1, length(vals))][, year := vals]
}

rbindlist(by(test_dt, seq(nrow(test_dt)), interpolate_rows))
  • Related