Below is my attempt at a minimal reproducible example. Briefly explained, I am using rollApply from the rowr package to calculate a function over a rolling window, and using data from two columns simultaneously. If possible, I would like to skip n steps between each time the function is calculated on a new window. I will try to make it clear what I mean in the example below.
Here is the example data:
df1 <- tibble(
x = c(1:9),
y = c(1:9),
Date = as.Date(c("2015-08-08", "2015-08-15", "2015-08-22",
"2015-08-29","2015-09-05", "2015-09-12", "2015-09-19",
"2015-09-26", "2015-10-03"))
)
Here are the example functions:
calc_ex <- function(y){
sum(y[,1] y[,2])
}
roll_calc_ex <- function(y){
vec <- c(rep(NA, 2), rowr::rollApply(y, calc_ex, window = 3, minimum = 3))
y <- y %>%
mutate(estimate = vec)
return(y)
}
Applying the function roll_calc_ex() to df1, I get the following output:
> roll_calc_ex(df1)
# A tibble: 9 x 4
x y Date estimate
<int> <int> <date> <int>
1 1 1 2015-08-08 NA
2 2 2 2015-08-15 NA
3 3 3 2015-08-22 12
4 4 4 2015-08-29 18
5 5 5 2015-09-05 24
6 6 6 2015-09-12 30
7 7 7 2015-09-19 36
8 8 8 2015-09-26 42
9 9 9 2015-10-03 48
Ideally, I would to have a rolling window that skips n steps, say n=2, to produce the following output:
# A tibble: 9 x 4
x y Date estimate
<int> <int> <date> <int>
1 1 1 2015-08-08 NA
2 2 2 2015-08-15 NA
3 3 3 2015-08-22 12
4 4 4 2015-08-29 NA
5 5 5 2015-09-05 NA
6 6 6 2015-09-12 30
7 7 7 2015-09-19 NA
8 8 8 2015-09-26 NA
9 9 9 2015-10-03 48
Alternatively, instead of returning NA for every row skipped, the number from the previous calculation could be filled in (something I am planning to do later aynway using fill() from tidyverse).
If this is possible to solve using for example rollapply() from the zoo package, that would also be interesting to hear. I am only using rowr::rollApply() because I need to apply the function to two columns simultaneously. I know it is possible to use runner() from the package "runner", but in my more complicated problem I need to run parallel computations. I am using the furrr package for parallelization, and my code works well with rollApply, but not with runner(). The problem I have with runner is explained here: Problem with parallelization using furrr [and runner::runner() ] in R .
Thanks to anyone that took the time to read this post. Any help will be much appreciated.
CodePudding user response:
If we were to use the slider package
library(tidyverse)
library(slider)
df1 <- tibble(
x = c(1:9),
y = c(1:9),
Date = as.Date(c("2015-08-08", "2015-08-15", "2015-08-22",
"2015-08-29","2015-09-05", "2015-09-12", "2015-09-19",
"2015-09-26", "2015-10-03")))
df1 |>
mutate(rolling_sum = slide2_dbl(.x = x,.y = y,.f = sum,
.step = 3,.before = 2,.complete = T
))
#> # A tibble: 9 x 4
#> x y Date rolling_sum
#> <int> <int> <date> <dbl>
#> 1 1 1 2015-08-08 NA
#> 2 2 2 2015-08-15 NA
#> 3 3 3 2015-08-22 12
#> 4 4 4 2015-08-29 NA
#> 5 5 5 2015-09-05 NA
#> 6 6 6 2015-09-12 30
#> 7 7 7 2015-09-19 NA
#> 8 8 8 2015-09-26 NA
#> 9 9 9 2015-10-03 48
Created on 2021-10-21 by the reprex package (v2.0.1)
CodePudding user response:
1) The rowr package was removed from CRAN but we can use rollapplyr
(like rollapply
but the r
on the end means to default to right alignment) from zoo which has a by.column=
argument to specify whether processing is performed column by column (TRUE) or all columns are passed at once (FALSE) and a by=
argument which causes skipping.
library(dplyr)
library(zoo)
mutate(df1, roll =
rollapplyr(cbind(x, y), 3, calc_ex, fill = NA, by.column = FALSE, by = 2)
)
giving:
x y Date roll
1 1 1 2015-08-08 NA
2 2 2 2015-08-15 NA
3 3 3 2015-08-22 12
4 4 4 2015-08-29 NA
5 5 5 2015-09-05 24
6 6 6 2015-09-12 NA
7 7 7 2015-09-19 36
8 8 8 2015-09-26 NA
9 9 9 2015-10-03 48
2) Using complex arithmetic would also work:
f <- function(v) calc_ex(cbind(Re(v), Im(v)))
mutate(df1, roll = rollapplyr(x y * 1i, 3, f, fill = NA, by = 2))
3) and if we look into call_ex then it could be written (although this does not generalize):
mutate(df1, roll = rollapplyr(x y, 3, sum, fill = NA, by = 2))
4) We could also consider using zoo objects rather than data frames:
z <- read.zoo(df1, index = "Date")
merge(z, roll = rollapplyr(z, 3, calc_ex, by.column = FALSE, by = 2))