R roll mean on a non continuous time serie-CodePudding

I want to make a rolling mean on the last X number of days. rollmean() does that using rows. Since I am using loggers that sometimes fail, and also the data were cleaned, the time series is not continuous (rows do not necessarily represent a constant time difference).

A colleague suggested the solution below, which works great. Except my data need to be grouped (in the example by treatment). For each day, I want the rolling mean of the last X days for each treatment.

Thanks

 # making some example data
 # vector with days since the beginning of experiment

days <- 0:30
 
 # random values df1 <-   tibble::tibble(
     days_since_beginning = days,
     value_to_used = rnorm(length(days)),
     treatment = sample(letters[1],31,replace = TRUE)   )
 
 df2 <-   tibble::tibble(
     days_since_beginning = days,
     value_to_used = rnorm(length(days)),
     treatment = sample(letters[2],31,replace = TRUE)   )
 
 df <- full_join(df1, df2)
 
 # how long should be the period for mean

 time_period <- 10 # calculate for last 10 days
 
 
 df_mean <- df %>%    dplyr::mutate(
     # calculate rolling mean 
     roll_mean = purrr::map_dbl(
       .x = days_since_beginning,
       .f = ~ df %>% 
         # select only data for the last `time_period`
         dplyr::filter(days_since_beginning >= .x - time_period &
                         days_since_beginning <= .x) %>% 
         purrr::pluck("value_to_used") %>% 
         mean() %>% 
         return()
     )   )

CodePudding user response：

This takes the mean over the last 10 days by treatment. The width argument includes a computation of how many rows back to use so that it corresponds to 10 days rather than 10 rows. This uses the fact that width can be a vector.

library(dplyr)
library(zoo)

df %>%
  group_by(treatment) %>%
  mutate(roll = rollapplyr(value_to_used, 
    seq_along(days_since_beginning) - findInterval(days_since_beginning - 10, days_since_beginning), 
    mean)) %>%
  ungroup