I have a data frame of two variables. The first variable is the year spanning continuously from 1 to 2016, the second variable is my value of interest. E.g.
ts <- data.frame(Year=c(1:2016), TS=sample(seq(from=-1.5, to=1.5,by=0.01), size=2016, replace=TRUE))
I need to compute 21 moving average values starting from 50 years before the last observation (2016-50=1966) each encompassing a window that ranges from 10, 11, 12, ...30 years, with the first value aligned to the year 2016. What I need is a data frame with 23 columns ("Year", "TS", "av10", "av11", "av12", ..."av30") and 2016 rows from year 1 to the year 2016. Therefore, I did this:
subset1966 <- ts[1:1966,]
subset1966$Year.50 <- c(51:2016)
ts.50yrs <- subset1966%>%
mutate(av10 = rollmean(TS, k = 10, fill = NA, align = "right"),
av11 = rollmean(TS, k = 11, fill = NA, align = "right"),
av12 = rollmean(TS, k = 12, fill = NA, align = "right"),
av13 = rollmean(TS, k = 13, fill = NA, align = "right"),
av14 = rollmean(TS, k = 14, fill = NA, align = "right"),
av15 = rollmean(TS, k = 15, fill = NA, align = "right"),
av16 = rollmean(TS, k = 16, fill = NA, align = "right"),
av17 = rollmean(TS, k = 17, fill = NA, align = "right"),
av18 = rollmean(TS, k = 18, fill = NA, align = "right"),
av19 = rollmean(TS, k = 19, fill = NA, align = "right"),
av20 = rollmean(TS, k = 20, fill = NA, align = "right"),
av21 = rollmean(TS, k = 21, fill = NA, align = "right"),
av22 = rollmean(TS, k = 22, fill = NA, align = "right"),
av23 = rollmean(TS, k = 23, fill = NA, align = "right"),
av24 = rollmean(TS, k = 24, fill = NA, align = "right"),
av25 = rollmean(TS, k = 25, fill = NA, align = "right"),
av26 = rollmean(TS, k = 26, fill = NA, align = "right"),
av27 = rollmean(TS, k = 27, fill = NA, align = "right"),
av28 = rollmean(TS, k = 28, fill = NA, align = "right"),
av29 = rollmean(TS, k = 29, fill = NA, align = "right"),
av30 = rollmean(TS, k = 30, fill = NA, align = "right"))
ts.50yrs.df <- ts.50yrs[,-c(1:2)]
colnames(ts.50yrs.df)[1] <- "Year"
df.ts.50 <- full_join(ts,ts.50yrs.df)
I repeated the same procedure for different years starting 100, 150, 200 and 250 years before the last observation. E.g.
subset1766 <- data[1:1766,]
subset1766$Year.250 <- c(251:2016)
ts.250yrs <- subset1766%>%
mutate(av10 = rollmean(TS, k = 10, fill = NA, align = "right"),
av11 = rollmean(TS, k = 11, fill = NA, align = "right"),
av12 = rollmean(TS, k = 12, fill = NA, align = "right"),
av13 = rollmean(TS, k = 13, fill = NA, align = "right"),
av14 = rollmean(TS, k = 14, fill = NA, align = "right"),
av15 = rollmean(TS, k = 15, fill = NA, align = "right"),
av16 = rollmean(TS, k = 16, fill = NA, align = "right"),
av17 = rollmean(TS, k = 17, fill = NA, align = "right"),
av18 = rollmean(TS, k = 18, fill = NA, align = "right"),
av19 = rollmean(TS, k = 19, fill = NA, align = "right"),
av20 = rollmean(TS, k = 20, fill = NA, align = "right"),
av21 = rollmean(TS, k = 21, fill = NA, align = "right"),
av22 = rollmean(TS, k = 22, fill = NA, align = "right"),
av23 = rollmean(TS, k = 23, fill = NA, align = "right"),
av24 = rollmean(TS, k = 24, fill = NA, align = "right"),
av25 = rollmean(TS, k = 25, fill = NA, align = "right"),
av26 = rollmean(TS, k = 26, fill = NA, align = "right"),
av27 = rollmean(TS, k = 27, fill = NA, align = "right"),
av28 = rollmean(TS, k = 28, fill = NA, align = "right"),
av29 = rollmean(TS, k = 29, fill = NA, align = "right"),
av30 = rollmean(TS, k = 30, fill = NA, align = "right"))
ts.250yrs.df <- ts.250yrs[,-c(1:2)]
colnames(ts.250yrs.df)[1] <- "Year"
df.ts.250 <- full_join(ts,ts.50yrs.df)
And then I merged the data frames together. Eg.:
df <- cbind(df.ts.50, df.ts.100, df.ts.150, df.ts.200, df.ts.250)
However, what I would like to do next is to repeat the same moving average with the start year that goes from 50, 51, 52, ...250, meaning starting in 1966 (2016-50) and going until 1766 (2016-250) and having a moving average ranging from 10, 11, 12, ...30 years for each starting year. Additionally, all the values would need to be aligned to the year 2016.
So, it would be a data frame with 4221 columns (201 (= years from 50 to 250) * 21 (= windows from 10 to 30 years)), plus the two columns of "Year" and "TS". The column name would be something like "Year", "TS", "av50.10yr", "av50.11yr", "av50.12yr", ..., "av50.30yr", "av51.10yr", "av51.11yr", "av51.12yr", ... "av51.30yr", ... "av250.10yr", "av250.11yr", "av250.12yr", ..., "av250.30yr") and 2016 rows from year 1 to the year 2016.
CodePudding user response:
Here, I created a custom function, moving_mean()
, that takes the dataframe and starting year. Then, I used mutate
to create a new column with the starting year sequence (e.g., 50) through 2016. Then, I created a list of the names that will be used for column names in the dataframe (e.g., av50.10yr
). Then, I used map2
to mutate
new columns for the number of rolling years (e.g., 10, 11, ..., 30). Then, I used purrr::reduce
to combine all of the dataframes into one for a given starting year (e.g., 50). Next, I used map
to run the function, moving_mean
, on each starting year from 50 to 250 on the original dataframe, ts
. Then, I put the 201 dataframes into the same dataframe (again using reduce
). Finally, I joined this dataframe to the original, which added 4,221 new columns to ts
.
library(tidyverse)
moving_mean <- function(data, y){
new_year <- paste0("Year.", as.character(y))
x_new <- data %>%
filter(Year <= (tail(Year, n=1) - y)) %>%
rowwise %>%
dplyr::mutate({{new_year}} := (Year y)) %>%
ungroup()
varnames <- unlist(map(10:30, function(x) paste0("av", y, ".", x, "yr")))
map2(10:30, varnames, function(x, y) x_new %>%
dplyr::mutate({{y}} := rollmean(TS, k = x, fill = NA, align = "right")) %>%
as.data.frame()) %>%
reduce(left_join, by = c(names(x_new)[1:3])) %>%
select(-c("Year", "TS")) %>%
dplyr::rename(Year = 1)
}
results <- map(c(seq(50, 250, 1)), ~ moving_mean(ts, .x)) %>%
reduce(left_join, by = "Year") %>%
left_join(ts, ., by = "Year")
Output (Only printing small portion here)
dim(results)
# [1] 2016 4223
results[60:70, 1:15]
Year TS av50.10yr av50.11yr av50.12yr av50.13yr av50.14yr av50.15yr av50.16yr av50.17yr av50.18yr av50.19yr av50.20yr av50.21yr av50.22yr
60 60 0.63 -0.080 NA NA NA NA NA NA NA NA NA NA NA NA
61 61 -1.50 0.123 -0.01727273 NA NA NA NA NA NA NA NA NA NA NA
62 62 -0.07 0.295 0.15545455 0.02416667 NA NA NA NA NA NA NA NA NA NA
63 63 -0.84 0.338 0.30000000 0.17166667 0.04923077 NA NA NA NA NA NA NA NA NA
64 64 -0.36 0.317 0.39363636 0.35416667 0.23153846 0.1135714 NA NA NA NA NA NA NA NA
65 65 0.34 0.442 0.29000000 0.36250000 0.32846154 0.2164286 0.1073333 NA NA NA NA NA NA NA
66 66 -0.45 0.312 0.35363636 0.22166667 0.29384615 0.2671429 0.1666667 0.067500 NA NA NA NA NA NA
67 67 -1.17 0.415 0.25000000 0.29333333 0.17615385 0.2464286 0.2246667 0.133125 0.04176471 NA NA NA NA NA
68 68 -0.77 0.355 0.42363636 0.27166667 0.31000000 0.2000000 0.2640000 0.242500 0.15529412 0.06777778 NA NA NA NA
69 69 0.51 0.477 0.45090909 0.50583333 0.35923077 0.3885714 0.2806667 0.335625 0.31117647 0.22500000 0.1384211 NA NA NA
70 70 -0.25 0.331 0.42272727 0.40333333 0.45769231 0.3250000 0.3546667 0.255625 0.30882353 0.28722222 0.2068421 0.1255 NA NA
Data
set.seed(28)
ts <- data.frame(Year = c(1:2016),
TS = sample(
seq(from = -1.5, to = 1.5, by = 0.01),
size = 2016,
replace = TRUE
))
CodePudding user response:
You could put the logic into a function makeAvg()
, that takes simply the number of years and subtracts from max(dat$year)
, i.e. 50
gives the desired 1:1966
,
# makeAvg <- \(s, ks=10:30, x='TS', data=dat) {
# sbs <- 1:(max(data$Year) - s)
# X <- data[sbs, ]
# cbind(X, sapply(ks, \(k) zoo::rollmean(X[[x]], k=k, fill=NA, align="right")) |>
# `colnames<-`(paste0('av', ks, '_', s)))
# }
(Edit: To avoid slow zoo::rollmean
, try this version below, that uses RcppRoll::roll_mean
which is about 20 times faster.)
makeAvg <- \(s, ks=10:30, x='TS', data=dat) {
sbs <- 1:(max(data$Year) - s)
X <- data[sbs, ]
cbind(X,
sapply(ks, \(k) RcppRoll::roll_mean(X[[x]], n=k, fill=NA, align='r')) |>
`colnames<-`(paste0('av', ks, '_', s)))
}
Note: R version >= 4.1
and that throws this:
makeAvg(50)[8:13, ]
# Year TS av10_50 av11_50 av12_50 av13_50 av14_50 av15_50 av16_50
# 8 8 -0.23 NA NA NA NA NA NA NA
# 9 9 -1.27 NA NA NA NA NA NA NA
# 10 10 -0.62 -0.448 NA NA NA NA NA NA
# 11 11 0.14 -0.332 -0.3945455 NA NA NA NA NA
# 12 12 -0.41 -0.375 -0.3390909 -0.3958333 NA NA NA NA
# 13 13 -1.31 -0.429 -0.4600000 -0.4200000 -0.4661538 NA NA NA
# av17_50 av18_50 av19_50 av20_50 av21_50 av22_50 av23_50 av24_50 av25_50 av26_50
# 8 NA NA NA NA NA NA NA NA NA NA
# 9 NA NA NA NA NA NA NA NA NA NA
# 10 NA NA NA NA NA NA NA NA NA NA
# 11 NA NA NA NA NA NA NA NA NA NA
# 12 NA NA NA NA NA NA NA NA NA NA
# 13 NA NA NA NA NA NA NA NA NA NA
# av27_50 av28_50 av29_50 av30_50
# 8 NA NA NA NA
# 9 NA NA NA NA
# 10 NA NA NA NA
# 11 NA NA NA NA
# 12 NA NA NA NA
# 13 NA NA NA NA
Then put the starts
into a vector,
starts <- eq(50, 250, 1)
over which you loop using lapply
, and put it into Reduce()
with merge()
:
res <- Reduce(\(...) merge(..., all=TRUE), lapply(starts, makeAvg)) ## runs ~25 s
This gives a data frame with exactly the expected dimensions dimensions,
dim(res)
# [1] 1966 4223
and these names:
names(res)
# [1] "Year" "TS" "av10_50" "av11_50" "av12_50" "av13_50" "av14_50"
# [8] "av15_50" "av16_50" "av17_50" "av18_50" "av19_50" "av20_50" "av21_50"
# [...]
Data:
set.seed(42)
dat <- data.frame(Year=c(1:2016),
TS=sample(seq(from=-1.5, to=1.5,by=0.01), size=2016, replace=TRUE))