Home > Net >  R interpolate rows with different start and end columns
R interpolate rows with different start and end columns

Time:10-15

I have multiple data sets that look like the following:

country 2000 2001 2002 2003 2004 2005 2006 2007
Germany 0.1 0.2 0.5 NA 0.7 0.6 0.8 NA
USA NA NA NA 0.1 0.4 NA 0.4 0.1
France 0.4 0.3 0.5 0.8 NA NA 1.0 1.3
Belgium NA 0.1 NA 0.5 0.6 0.5 NA NA

I want to interpolate horizontally, starting from the first existing value until the last existing value. In essence, NAs at the start and NAs at the end should stay NAs, but NAs in the middle should be interpolated. I have been trying to think of a possible solution, but nothing comes up. I appreciate your help!

CodePudding user response:

You can use zoo::na.approx:

df[-1] <- t(apply(df[-1], 1, \(x) na.approx(x, na.rm = F)))

or even without apply:

replace(df, -1, t(na.approx(t(df[-1]))))

output

  country `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007`
1 Germany    0.1    0.2    0.5    0.6  0.7    0.6      0.8   NA  
2 USA       NA     NA     NA      0.1  0.4    0.4      0.4    0.1
3 France     0.4    0.3    0.5    0.8  0.867  0.933    1      1.3
4 Belgium   NA      0.1    0.3    0.5  0.6    0.5     NA     NA  

Note that a tidy format might be easier for further wrangling here:

library(dplyr)
library(tidyr)
df %>% 
  mutate(id = 1:n()) %>% 
  pivot_longer(-c(country, id)) %>% 
  group_by(id) %>% 
  mutate(value = na.approx(value, na.rm = F))
  • Related