Rolling 4 week average by categorical group in R or power BI-CodePudding

I am trying to create a rolling 4 week average of the Actuals column and grouped on the State.

So starting at the bottom, the 4 week average for TX in the final row would be, (1 0 (no sales in week of may22) 1 3) / 4 (number of weeks).

Week    State   Actuals
4/24/2022   CA  1
5/1/2022    CA  3
5/8/2022    CA  34
5/8/2022    NV  2
5/8/2022    AZ  1
5/8/2022    TX  3
5/15/2022   CA  27
5/15/2022   TX  1
5/22/2022   CA  15
5/29/2022   TX  1
5/29/2022   CA  24

I like R more than Power BI, but either would be great

CodePudding user response：

Use complete to include the weeks missing filling them with 0

 library(tidyverse)

 df %>%
   mutate(Week = as.Date(Week, "%m/%d/%Y")) %>%
   complete(Week, State, fill = list(Actuals = 0)) %>%
   group_by(State) %>%
   summarise(Results = zoo::rollmean(Actuals, 4), .groups = 'drop')

# A tibble: 12 x 2
   State Results
   <chr>   <dbl>
 1 AZ       0.25
 2 AZ       0.25
 3 AZ       0.25
 4 CA      16.2 
 5 CA      19.8 
 6 CA      25   
 7 NV       0.5 
 8 NV       0.5 
 9 NV       0.5 
10 TX       1   
11 TX       1   
12 TX       1.25

Edit:

If you want to maintain the week use stats::filter:

df %>%
     mutate(Week = as.Date(Week, "%m/%d/%Y")) %>%
     complete(State, Week, fill = list(Actuals = 0)) %>%
     group_by(State) %>%
     mutate(Results = stats::filter(Actuals, rep(1,4)/4, sides = 1))
   State       Week Actuals Results
1     AZ 2022-04-24       0      NA
2     AZ 2022-05-01       0      NA
3     AZ 2022-05-08       1      NA
4     AZ 2022-05-15       0    0.25
5     AZ 2022-05-22       0    0.25
6     AZ 2022-05-29       0    0.25
7     CA 2022-04-24       1      NA
8     CA 2022-05-01       3      NA
9     CA 2022-05-08      34      NA
10    CA 2022-05-15      27   16.25
11    CA 2022-05-22      15   19.75
12    CA 2022-05-29      24   25.00
13    NV 2022-04-24       0      NA
14    NV 2022-05-01       0      NA
15    NV 2022-05-08       2      NA
16    NV 2022-05-15       0    0.50
17    NV 2022-05-22       0    0.50
18    NV 2022-05-29       0    0.50
19    TX 2022-04-24       0      NA
20    TX 2022-05-01       0      NA
21    TX 2022-05-08       3      NA
22    TX 2022-05-15       1    1.00
23    TX 2022-05-22       0    1.00
24    TX 2022-05-29       1    1.25

CodePudding user response：

Using your data :

x <- "
Week    State   Actuals
4/24/2022   CA  1
5/1/2022    CA  3
5/8/2022    CA  34
5/8/2022    NV  2
5/8/2022    AZ  1
5/8/2022    TX  3
5/15/2022   CA  27
5/15/2022   TX  1
5/22/2022   CA  15
5/29/2022   TX  1
5/29/2022   CA  24
"
df <- read.table(textConnection(x) , header = TRUE)
df

calculating averages every 4 weeks from the beginning

#===================================

library(dplyr , warn.conflicts = FALSE)

df$Week <- as.Date(df$Week , format = "%m/%d/%y")

uw <- unique(df$Week)

weeks <- list()
for(i in 1:ceiling(length(uw)/4)){
  s <- i*4 - 3
  weeks[[i]] <- df %>% filter( Week %in% uw[s:(s 3)]) %>% group_by(State) %>%
    summarise(Avg = format(round(mean(Actuals) , 3) , 3))
}
#===================================

the first 4 weeks :

weeks[[1]] 

#> # A tibble: 4 × 2
#>   State   Avg
#>   <chr> <dbl>
#> 1 AZ      1  
#> 2 CA     16.25
#> 3 NV      2  
#> 4 TX      2

the second 4 weeks :

weeks[[2]]

#> # A tibble: 2 × 2
#>   State   Avg
#>   <chr> <dbl>
#> 1 CA     19.5
#> 2 TX      1

^{Created on 2022-06-08 by the reprex package (v2.0.1)}