I'm trying to calculate a windowed weighted moving average, with a window value of n. I essentially have a set of products, a price for each id, and a reference. I then calculate the ratio of the price with respect to the reference. (price = 45, reference = 45, distance = 1) From this calculation I then obtain a weight for each observation. I would like to compare results when doing a simple moving average of all the prices and a weighted moving average.
library(tidyverse)
df <- tibble(id = c(1:15),
price = c(40,50,34,56,78,35,23,40,50,34,56,78,35,23,12),
product = c(sample(c("A","B"), 15, replace = TRUE)),
reference = 45,
distance = price / reference)
max_weight = 1
min_weight = 0
max_distance = 1
min_distance = 0
df <- df %>%
mutate(weight = case_when(
distance < 1 ~ (min_weight * (min_distance - distance) max_weight * (distance - max_distance)) / (min_distance - max_distance) ,
TRUE ~ 1
)
)
> df %>%
head()
# A tibble: 6 x 6
id price product reference distance weight
<int> <dbl> <chr> <dbl> <dbl> <dbl>
1 1 40 B 45 0.889 0.111
2 2 50 B 45 1.11 1
3 3 34 B 45 0.756 0.244
4 4 56 A 45 1.24 1
5 5 78 B 45 1.73 1
6 6 35 A 45 0.778 0.222
Moving Average calculation:
moving_average <- function(x,n){stats::filter(x, c(0, rep(1/n,n)), sides=1) }
df <- df %>%
group_by(product) %>%
mutate(moving_average = moving_average(price, n =3))
> df %>%
head()
# A tibble: 6 x 7
# Groups: product [2]
id price product reference distance weight moving_average
<int> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 40 B 45 0.889 0.111 NA
2 2 50 B 45 1.11 1 NA
3 3 34 B 45 0.756 0.244 NA
4 4 56 A 45 1.24 1 NA
5 5 78 B 45 1.73 1 41.3
6 6 35 A 45 0.778 0.222 NA
Essentially, the last step would be to calculate a moving average such that:
sum(price_i * weight_i) / sum(weight_i)
And this calculation would only take place in the given window. I can already create a column that calculates sum(price_i * weight_i), but I'm stuck when it comes to correctly adding those values n times, and then dividing them by the corresponding n weights, where n is the moving average window. Any ideas?
CodePudding user response:
I think zoo::rollapplyr
should work here. Here's a simple n=2
window,
MA <- function(X) {
if (!is.matrix(X)) X <- matrix(X, nrow = 1)
Hmisc::wtd.mean(X[,1], X[,2])
}
df %>%
group_by(product) %>%
mutate(n2 = zoo::rollapplyr(
cbind(price, weight), 2, MA,
by.column = FALSE, partial = TRUE)
) %>%
ungroup()
# # A tibble: 15 x 7
# id price product reference distance weight n2
# <int> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 40 B 45 0.889 0.111 40
# 2 2 50 B 45 1.11 1 49
# 3 3 34 A 45 0.756 0.244 34
# 4 4 56 B 45 1.24 1 53
# 5 5 78 B 45 1.73 1 67
# 6 6 35 B 45 0.778 0.222 70.2
# 7 7 23 B 45 0.511 0.489 26.8
# 8 8 40 B 45 0.889 0.111 26.1
# 9 9 50 A 45 1.11 1 46.9
# 10 10 34 B 45 0.756 0.244 35.9
# 11 11 56 B 45 1.24 1 51.7
# 12 12 78 A 45 1.73 1 64
# 13 13 35 B 45 0.778 0.222 52.2
# 14 14 23 B 45 0.511 0.489 26.8
# 15 15 12 A 45 0.267 0.733 50.1
And here's a method demonstrating multiple windows in one call:
df %>%
group_by(product) %>%
mutate(
data.frame(lapply(
setNames(2:4, paste0("n", 2:4)),
function(n) zoo::rollapplyr(
cbind(price, weight), n, MA,
by.column = FALSE, partial = TRUE)
))
) %>%
ungroup()
# # A tibble: 15 x 9
# id price product reference distance weight n2 n3 n4
# <int> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 40 B 45 0.889 0.111 40 40 40
# 2 2 50 B 45 1.11 1 49 49 49
# 3 3 34 A 45 0.756 0.244 34 34 34
# 4 4 56 B 45 1.24 1 53 52.3 52.3
# 5 5 78 B 45 1.73 1 67 61.3 60.6
# 6 6 35 B 45 0.778 0.222 70.2 63.8 59.5
# 7 7 23 B 45 0.511 0.489 26.8 56.7 56.4
# 8 8 40 B 45 0.889 0.111 26.1 28.5 55.7
# 9 9 50 A 45 1.11 1 46.9 46.9 46.9
# 10 10 34 B 45 0.756 0.244 35.9 28.4 29.8
# 11 11 56 B 45 1.24 1 51.7 50.7 43.4
# 12 12 78 A 45 1.73 1 64 60.7 60.7
# 13 13 35 B 45 0.778 0.222 52.2 49.2 48.5
# 14 14 23 B 45 0.511 0.489 26.8 43.8 42.6
# 15 15 12 A 45 0.267 0.733 50.1 50.0 48.7
This method takes advantage of the not-well-known behavior of mutate
with an unname argument that returns a data.frame
. The use of setNames
is so that the column names are meaningfully named, there are likely other ways one might approach that.
There's not a particular reason I'm using Hmisc::wtd.mean
over a custom function other than I know it works well. The use of the MA
function is because within zoo::rollapply*
, the FUN=
argument is passed a single matrix, so we need to handle it specially, even more so because due to partial=TRUE
, the first time MA
is called for each group, it is passed a vector
instead of a matrix
.
CodePudding user response:
like this?
## example data
df <-
data.frame(
price = 1:20,
window = c(4,10,6) %>% rep(., times = .),
weight = rnorm(20)
)
store sum of next n = window
values in new column:
df %>%
rowwise %>%
mutate(
mov_avg = price * weight,
sum_next_weights = sum(lead(.$weight, window), na.rm = TRUE)
)
proceed with calculations