I have multiple data sets that look like the following:
country | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 |
---|---|---|---|---|---|---|---|---|
Germany | 0.1 | 0.2 | 0.5 | NA | 0.7 | 0.6 | 0.8 | NA |
USA | NA | NA | NA | 0.1 | 0.4 | NA | 0.4 | 0.1 |
France | 0.4 | 0.3 | 0.5 | 0.8 | NA | NA | 1.0 | 1.3 |
Belgium | NA | 0.1 | NA | 0.5 | 0.6 | 0.5 | NA | NA |
I want to interpolate horizontally, starting from the first existing value until the last existing value. In essence, NAs at the start and NAs at the end should stay NAs, but NAs in the middle should be interpolated. I have been trying to think of a possible solution, but nothing comes up. I appreciate your help!
CodePudding user response:
You can use zoo::na.approx
:
df[-1] <- t(apply(df[-1], 1, \(x) na.approx(x, na.rm = F)))
or even without apply:
replace(df, -1, t(na.approx(t(df[-1]))))
output
country `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007`
1 Germany 0.1 0.2 0.5 0.6 0.7 0.6 0.8 NA
2 USA NA NA NA 0.1 0.4 0.4 0.4 0.1
3 France 0.4 0.3 0.5 0.8 0.867 0.933 1 1.3
4 Belgium NA 0.1 0.3 0.5 0.6 0.5 NA NA
Note that a tidy
format might be easier for further wrangling here:
library(dplyr)
library(tidyr)
df %>%
mutate(id = 1:n()) %>%
pivot_longer(-c(country, id)) %>%
group_by(id) %>%
mutate(value = na.approx(value, na.rm = F))