I have a dataset with population numbers by country, disaggregated by gender and age brackets. The data is recorded in 5 year increments. I would like to interpolate the data linearly to have annual data.
My data looks like this:
I have figured out how to do the interpolation for one column. This is the code I used:
n <- 1
j <- c()
while(n < nrow(pop)){
a <- as.data.frame(approx(x = pop$`5year`[n:(n 4)], y = pop$`Male_0-4`[n:(n 4)], xout = 2000:2020))
j <- as.matrix(c(j,a$y))
n <- n 5
}
Here, j produces one long vector that would represent the relevant column in the new, interpolated dataset. I would therefore like to produce one such vector for each age-gender combo (i.e. column in the original dataset) and bind them together.
However, I cannot figure out how to do that for all my columns at once. Neither my attempts at functions, for loops or apply family members has worked. I'm thankful for any input.
CodePudding user response:
I'd write a function to interpolate an arbitrary variable within country group, and then map()
it over all the variables and join them back together.
library(tidyverse)
data("population")
# create some data to interpolate
population_5 <- population %>%
filter(year %% 5 == 0) %>%
mutate(female_pop = population / 2,
male_pop = population / 2)
interpolate_func <- function(variable, data) {
data %>%
group_by(country) %>%
# can't interpolate if only one year
filter(n() >= 2) %>%
group_modify(~as_tibble(approx(.x$year, .x[[variable]],
xout = min(.x$year):max(.x$year)))) %>%
set_names("country", "year", paste0(variable, "_interpolated")) %>%
ungroup()
}
vars_to_interpolate <- names(select(population_5, -country, -year))
map(vars_to_interpolate, interpolate_func,
data = population_5) %>%
reduce(full_join, by = c("country", "year"))
#> # A tibble: 3,395 × 5
#> country year population_interpolated female_pop_interp… male_pop_interp…
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 Afghanistan 1995 17586073 8793036. 8793036.
#> 2 Afghanistan 1996 18187930. 9093965. 9093965.
#> 3 Afghanistan 1997 18789788. 9394894. 9394894.
#> 4 Afghanistan 1998 19391645. 9695823. 9695823.
#> 5 Afghanistan 1999 19993503. 9996751. 9996751.
#> 6 Afghanistan 2000 20595360 10297680 10297680
#> 7 Afghanistan 2001 21448459 10724230. 10724230.
#> 8 Afghanistan 2002 22301558 11150779 11150779
#> 9 Afghanistan 2003 23154657 11577328. 11577328.
#> 10 Afghanistan 2004 24007756 12003878 12003878
#> # … with 3,385 more rows
Created on 2022-06-01 by the reprex package (v2.0.1)