Home > Software design >  Moving average along a time series starting at different years and with different window length
Moving average along a time series starting at different years and with different window length

Time:12-29

I have a data frame of two variables. The first variable is the year spanning continuously from 1 to 2016, the second variable is my value of interest. E.g.

ts <- data.frame(Year=c(1:2016), TS=sample(seq(from=-1.5, to=1.5,by=0.01), size=2016, replace=TRUE))

I need to compute 21 moving average values starting from 50 years before the last observation (2016-50=1966) each encompassing a window that ranges from 10, 11, 12, ...30 years, with the first value aligned to the year 2016. What I need is a data frame with 23 columns ("Year", "TS", "av10", "av11", "av12", ..."av30") and 2016 rows from year 1 to the year 2016. Therefore, I did this:

subset1966 <- ts[1:1966,]
subset1966$Year.50 <- c(51:2016)

ts.50yrs <- subset1966%>%
  mutate(av10 = rollmean(TS, k = 10, fill = NA, align = "right"),
         av11 = rollmean(TS, k = 11, fill = NA, align = "right"),
         av12 = rollmean(TS, k = 12, fill = NA, align = "right"),
         av13 = rollmean(TS, k = 13, fill = NA, align = "right"),
         av14 = rollmean(TS, k = 14, fill = NA, align = "right"),
         av15 = rollmean(TS, k = 15, fill = NA, align = "right"),
         av16 = rollmean(TS, k = 16, fill = NA, align = "right"),
         av17 = rollmean(TS, k = 17, fill = NA, align = "right"),
         av18 = rollmean(TS, k = 18, fill = NA, align = "right"),
         av19 = rollmean(TS, k = 19, fill = NA, align = "right"),
         av20 = rollmean(TS, k = 20, fill = NA, align = "right"),
         av21 = rollmean(TS, k = 21, fill = NA, align = "right"),
         av22 = rollmean(TS, k = 22, fill = NA, align = "right"),
         av23 = rollmean(TS, k = 23, fill = NA, align = "right"),
         av24 = rollmean(TS, k = 24, fill = NA, align = "right"),
         av25 = rollmean(TS, k = 25, fill = NA, align = "right"),
         av26 = rollmean(TS, k = 26, fill = NA, align = "right"),
         av27 = rollmean(TS, k = 27, fill = NA, align = "right"),
         av28 = rollmean(TS, k = 28, fill = NA, align = "right"),
         av29 = rollmean(TS, k = 29, fill = NA, align = "right"),
         av30 = rollmean(TS, k = 30, fill = NA, align = "right"))

ts.50yrs.df <- ts.50yrs[,-c(1:2)]
colnames(ts.50yrs.df)[1] <- "Year"
df.ts.50 <- full_join(ts,ts.50yrs.df)

I repeated the same procedure for different years starting 100, 150, 200 and 250 years before the last observation. E.g.

subset1766 <- data[1:1766,]
subset1766$Year.250 <- c(251:2016)
ts.250yrs <- subset1766%>%
  mutate(av10 = rollmean(TS, k = 10, fill = NA, align = "right"),
         av11 = rollmean(TS, k = 11, fill = NA, align = "right"),
         av12 = rollmean(TS, k = 12, fill = NA, align = "right"),
         av13 = rollmean(TS, k = 13, fill = NA, align = "right"),
         av14 = rollmean(TS, k = 14, fill = NA, align = "right"),
         av15 = rollmean(TS, k = 15, fill = NA, align = "right"),
         av16 = rollmean(TS, k = 16, fill = NA, align = "right"),
         av17 = rollmean(TS, k = 17, fill = NA, align = "right"),
         av18 = rollmean(TS, k = 18, fill = NA, align = "right"),
         av19 = rollmean(TS, k = 19, fill = NA, align = "right"),
         av20 = rollmean(TS, k = 20, fill = NA, align = "right"),
         av21 = rollmean(TS, k = 21, fill = NA, align = "right"),
         av22 = rollmean(TS, k = 22, fill = NA, align = "right"),
         av23 = rollmean(TS, k = 23, fill = NA, align = "right"),
         av24 = rollmean(TS, k = 24, fill = NA, align = "right"),
         av25 = rollmean(TS, k = 25, fill = NA, align = "right"),
         av26 = rollmean(TS, k = 26, fill = NA, align = "right"),
         av27 = rollmean(TS, k = 27, fill = NA, align = "right"),
         av28 = rollmean(TS, k = 28, fill = NA, align = "right"),
         av29 = rollmean(TS, k = 29, fill = NA, align = "right"),
         av30 = rollmean(TS, k = 30, fill = NA, align = "right"))

ts.250yrs.df <- ts.250yrs[,-c(1:2)]
colnames(ts.250yrs.df)[1] <- "Year"
df.ts.250 <- full_join(ts,ts.50yrs.df)

And then I merged the data frames together. Eg.:

df <- cbind(df.ts.50, df.ts.100, df.ts.150, df.ts.200, df.ts.250)

However, what I would like to do next is to repeat the same moving average with the start year that goes from 50, 51, 52, ...250, meaning starting in 1966 (2016-50) and going until 1766 (2016-250) and having a moving average ranging from 10, 11, 12, ...30 years for each starting year. Additionally, all the values would need to be aligned to the year 2016.

So, it would be a data frame with 4221 columns (201 (= years from 50 to 250) * 21 (= windows from 10 to 30 years)), plus the two columns of "Year" and "TS". The column name would be something like "Year", "TS", "av50.10yr", "av50.11yr", "av50.12yr", ..., "av50.30yr", "av51.10yr", "av51.11yr", "av51.12yr", ... "av51.30yr", ... "av250.10yr", "av250.11yr", "av250.12yr", ..., "av250.30yr") and 2016 rows from year 1 to the year 2016.

CodePudding user response:

Here, I created a custom function, moving_mean(), that takes the dataframe and starting year. Then, I used mutate to create a new column with the starting year sequence (e.g., 50) through 2016. Then, I created a list of the names that will be used for column names in the dataframe (e.g., av50.10yr). Then, I used map2 to mutate new columns for the number of rolling years (e.g., 10, 11, ..., 30). Then, I used purrr::reduce to combine all of the dataframes into one for a given starting year (e.g., 50). Next, I used map to run the function, moving_mean, on each starting year from 50 to 250 on the original dataframe, ts. Then, I put the 201 dataframes into the same dataframe (again using reduce). Finally, I joined this dataframe to the original, which added 4,221 new columns to ts.

library(tidyverse)

moving_mean <- function(data, y){
  new_year <- paste0("Year.", as.character(y))
  
  x_new <- data %>% 
    filter(Year <= (tail(Year, n=1) - y)) %>% 
    rowwise %>% 
    dplyr::mutate({{new_year}} := (Year   y)) %>% 
    ungroup()
  
  varnames <- unlist(map(10:30, function(x) paste0("av", y, ".", x, "yr")))
  
  map2(10:30, varnames, function(x, y) x_new %>% 
         dplyr::mutate({{y}} := rollmean(TS, k = x, fill = NA, align = "right")) %>% 
         as.data.frame()) %>% 
    reduce(left_join, by = c(names(x_new)[1:3])) %>% 
    select(-c("Year", "TS")) %>% 
    dplyr::rename(Year = 1)
}

results <- map(c(seq(50, 250, 1)), ~ moving_mean(ts, .x)) %>% 
  reduce(left_join, by = "Year") %>% 
  left_join(ts, ., by = "Year")

Output (Only printing small portion here)

dim(results)
# [1] 2016 4223

results[60:70, 1:15]

   Year    TS av50.10yr   av50.11yr  av50.12yr  av50.13yr av50.14yr av50.15yr av50.16yr  av50.17yr  av50.18yr av50.19yr av50.20yr av50.21yr av50.22yr
60   60  0.63    -0.080          NA         NA         NA        NA        NA        NA         NA         NA        NA        NA        NA        NA
61   61 -1.50     0.123 -0.01727273         NA         NA        NA        NA        NA         NA         NA        NA        NA        NA        NA
62   62 -0.07     0.295  0.15545455 0.02416667         NA        NA        NA        NA         NA         NA        NA        NA        NA        NA
63   63 -0.84     0.338  0.30000000 0.17166667 0.04923077        NA        NA        NA         NA         NA        NA        NA        NA        NA
64   64 -0.36     0.317  0.39363636 0.35416667 0.23153846 0.1135714        NA        NA         NA         NA        NA        NA        NA        NA
65   65  0.34     0.442  0.29000000 0.36250000 0.32846154 0.2164286 0.1073333        NA         NA         NA        NA        NA        NA        NA
66   66 -0.45     0.312  0.35363636 0.22166667 0.29384615 0.2671429 0.1666667  0.067500         NA         NA        NA        NA        NA        NA
67   67 -1.17     0.415  0.25000000 0.29333333 0.17615385 0.2464286 0.2246667  0.133125 0.04176471         NA        NA        NA        NA        NA
68   68 -0.77     0.355  0.42363636 0.27166667 0.31000000 0.2000000 0.2640000  0.242500 0.15529412 0.06777778        NA        NA        NA        NA
69   69  0.51     0.477  0.45090909 0.50583333 0.35923077 0.3885714 0.2806667  0.335625 0.31117647 0.22500000 0.1384211        NA        NA        NA
70   70 -0.25     0.331  0.42272727 0.40333333 0.45769231 0.3250000 0.3546667  0.255625 0.30882353 0.28722222 0.2068421    0.1255        NA        NA

Data

set.seed(28)
ts <- data.frame(Year = c(1:2016),
                  TS = sample(
                    seq(from = -1.5, to = 1.5, by = 0.01),
                    size = 2016,
                    replace = TRUE
                  ))

CodePudding user response:

You could put the logic into a function makeAvg(), that takes simply the number of years and subtracts from max(dat$year), i.e. 50 gives the desired 1:1966,

# makeAvg <- \(s, ks=10:30, x='TS', data=dat) {
#   sbs <- 1:(max(data$Year) - s)
#   X <- data[sbs, ]
#   cbind(X, sapply(ks, \(k) zoo::rollmean(X[[x]], k=k, fill=NA, align="right")) |>
#           `colnames<-`(paste0('av', ks, '_', s)))
# }

(Edit: To avoid slow zoo::rollmean, try this version below, that uses RcppRoll::roll_mean which is about 20 times faster.)

makeAvg <- \(s, ks=10:30, x='TS', data=dat) {
  sbs <- 1:(max(data$Year) - s)
  X <- data[sbs, ]
  cbind(X, 
        sapply(ks, \(k) RcppRoll::roll_mean(X[[x]], n=k, fill=NA, align='r')) |>
          `colnames<-`(paste0('av', ks, '_', s)))
}

Note: R version >= 4.1

and that throws this:

makeAvg(50)[8:13, ]
#    Year    TS av10_50    av11_50    av12_50    av13_50 av14_50 av15_50 av16_50
# 8     8 -0.23      NA         NA         NA         NA      NA      NA      NA
# 9     9 -1.27      NA         NA         NA         NA      NA      NA      NA
# 10   10 -0.62  -0.448         NA         NA         NA      NA      NA      NA
# 11   11  0.14  -0.332 -0.3945455         NA         NA      NA      NA      NA
# 12   12 -0.41  -0.375 -0.3390909 -0.3958333         NA      NA      NA      NA
# 13   13 -1.31  -0.429 -0.4600000 -0.4200000 -0.4661538      NA      NA      NA
#    av17_50 av18_50 av19_50 av20_50 av21_50 av22_50 av23_50 av24_50 av25_50 av26_50
# 8       NA      NA      NA      NA      NA      NA      NA      NA      NA      NA
# 9       NA      NA      NA      NA      NA      NA      NA      NA      NA      NA
# 10      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA
# 11      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA
# 12      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA
# 13      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA
#    av27_50 av28_50 av29_50 av30_50
# 8       NA      NA      NA      NA
# 9       NA      NA      NA      NA
# 10      NA      NA      NA      NA
# 11      NA      NA      NA      NA
# 12      NA      NA      NA      NA
# 13      NA      NA      NA      NA

Then put the starts into a vector,

starts <- eq(50, 250, 1)

over which you loop using lapply, and put it into Reduce() with merge():

res <- Reduce(\(...) merge(..., all=TRUE), lapply(starts, makeAvg))  ## runs ~25 s

This gives a data frame with exactly the expected dimensions dimensions,

dim(res)
# [1] 1966 4223

and these names:

names(res)
# [1] "Year"     "TS"       "av10_50"  "av11_50"  "av12_50"  "av13_50"  "av14_50" 
# [8] "av15_50"  "av16_50"  "av17_50"  "av18_50"  "av19_50"  "av20_50"  "av21_50" 
# [...]

Data:

set.seed(42)
dat <- data.frame(Year=c(1:2016),
                  TS=sample(seq(from=-1.5, to=1.5,by=0.01), size=2016, replace=TRUE))
  • Related