Calculating a windowed weighted moving average where each observation has its own weight-CodePudding

I'm trying to calculate a windowed weighted moving average, with a window value of n. I essentially have a set of products, a price for each id, and a reference. I then calculate the ratio of the price with respect to the reference. (price = 45, reference = 45, distance = 1) From this calculation I then obtain a weight for each observation. I would like to compare results when doing a simple moving average of all the prices and a weighted moving average.

library(tidyverse)
df <- tibble(id = c(1:15),
       price = c(40,50,34,56,78,35,23,40,50,34,56,78,35,23,12),
       product = c(sample(c("A","B"), 15, replace = TRUE)),
       reference = 45,
       distance = price / reference)

max_weight = 1
min_weight = 0

max_distance = 1
min_distance = 0

df <- df %>%
  mutate(weight = case_when(
    distance < 1 ~ (min_weight * (min_distance - distance)   max_weight * (distance - max_distance)) / (min_distance - max_distance) ,
    TRUE ~ 1
  )
  )


> df %>%
    head()
# A tibble: 6 x 6
     id price product reference distance weight
  <int> <dbl> <chr>       <dbl>    <dbl>  <dbl>
1     1    40 B              45    0.889  0.111
2     2    50 B              45    1.11   1    
3     3    34 B              45    0.756  0.244
4     4    56 A              45    1.24   1    
5     5    78 B              45    1.73   1    
6     6    35 A              45    0.778  0.222

Moving Average calculation:

moving_average <- function(x,n){stats::filter(x, c(0, rep(1/n,n)), sides=1) }

df <- df %>%
  group_by(product) %>%
  mutate(moving_average = moving_average(price, n =3))


> df %>%
    head()
# A tibble: 6 x 7
# Groups:   product [2]
     id price product reference distance weight moving_average
  <int> <dbl> <chr>       <dbl>    <dbl>  <dbl>          <dbl>
1     1    40 B              45    0.889  0.111           NA  
2     2    50 B              45    1.11   1               NA  
3     3    34 B              45    0.756  0.244           NA  
4     4    56 A              45    1.24   1               NA  
5     5    78 B              45    1.73   1               41.3
6     6    35 A              45    0.778  0.222           NA

Essentially, the last step would be to calculate a moving average such that:

sum(price_i * weight_i) / sum(weight_i)

And this calculation would only take place in the given window. I can already create a column that calculates sum(price_i * weight_i), but I'm stuck when it comes to correctly adding those values n times, and then dividing them by the corresponding n weights, where n is the moving average window. Any ideas?

CodePudding user response：

I think zoo::rollapplyr should work here. Here's a simple n=2 window,

MA <- function(X) {
  if (!is.matrix(X)) X <- matrix(X, nrow = 1)
  Hmisc::wtd.mean(X[,1], X[,2])
}
df %>%
  group_by(product) %>%
  mutate(n2 = zoo::rollapplyr(
    cbind(price, weight), 2, MA,
          by.column = FALSE, partial = TRUE)
  ) %>%
  ungroup()
# # A tibble: 15 x 7
#       id price product reference distance weight    n2
#    <int> <dbl> <chr>       <dbl>    <dbl>  <dbl> <dbl>
#  1     1    40 B              45    0.889  0.111  40  
#  2     2    50 B              45    1.11   1      49  
#  3     3    34 A              45    0.756  0.244  34  
#  4     4    56 B              45    1.24   1      53  
#  5     5    78 B              45    1.73   1      67  
#  6     6    35 B              45    0.778  0.222  70.2
#  7     7    23 B              45    0.511  0.489  26.8
#  8     8    40 B              45    0.889  0.111  26.1
#  9     9    50 A              45    1.11   1      46.9
# 10    10    34 B              45    0.756  0.244  35.9
# 11    11    56 B              45    1.24   1      51.7
# 12    12    78 A              45    1.73   1      64  
# 13    13    35 B              45    0.778  0.222  52.2
# 14    14    23 B              45    0.511  0.489  26.8
# 15    15    12 A              45    0.267  0.733  50.1

And here's a method demonstrating multiple windows in one call:

df %>%
  group_by(product) %>%
  mutate(
    data.frame(lapply(
      setNames(2:4, paste0("n", 2:4)),
      function(n) zoo::rollapplyr(
        cbind(price, weight), n, MA,
        by.column = FALSE, partial = TRUE)
    ))
  ) %>%
  ungroup()
# # A tibble: 15 x 9
#       id price product reference distance weight    n2    n3    n4
#    <int> <dbl> <chr>       <dbl>    <dbl>  <dbl> <dbl> <dbl> <dbl>
#  1     1    40 B              45    0.889  0.111  40    40    40  
#  2     2    50 B              45    1.11   1      49    49    49  
#  3     3    34 A              45    0.756  0.244  34    34    34  
#  4     4    56 B              45    1.24   1      53    52.3  52.3
#  5     5    78 B              45    1.73   1      67    61.3  60.6
#  6     6    35 B              45    0.778  0.222  70.2  63.8  59.5
#  7     7    23 B              45    0.511  0.489  26.8  56.7  56.4
#  8     8    40 B              45    0.889  0.111  26.1  28.5  55.7
#  9     9    50 A              45    1.11   1      46.9  46.9  46.9
# 10    10    34 B              45    0.756  0.244  35.9  28.4  29.8
# 11    11    56 B              45    1.24   1      51.7  50.7  43.4
# 12    12    78 A              45    1.73   1      64    60.7  60.7
# 13    13    35 B              45    0.778  0.222  52.2  49.2  48.5
# 14    14    23 B              45    0.511  0.489  26.8  43.8  42.6
# 15    15    12 A              45    0.267  0.733  50.1  50.0  48.7

This method takes advantage of the not-well-known behavior of mutate with an unname argument that returns a data.frame. The use of setNames is so that the column names are meaningfully named, there are likely other ways one might approach that.

There's not a particular reason I'm using Hmisc::wtd.mean over a custom function other than I know it works well. The use of the MA function is because within zoo::rollapply*, the FUN= argument is passed a single matrix, so we need to handle it specially, even more so because due to partial=TRUE, the first time MA is called for each group, it is passed a vector instead of a matrix.

CodePudding user response：

like this?

## example data
df <- 
data.frame(
    price = 1:20,
    window = c(4,10,6) %>% rep(., times = .),
    weight = rnorm(20)
)

store sum of next n = window values in new column:

df %>%
    rowwise %>%
    mutate(
        mov_avg = price * weight,
        sum_next_weights = sum(lead(.$weight, window), na.rm = TRUE)
    )

proceed with calculations