Home > Net >  Work out rolling sums for variables with non-consecutive days in a dataframe in R
Work out rolling sums for variables with non-consecutive days in a dataframe in R

Time:06-01

I have some data on which I'd like to work out the win % across a rolling 14 day window thoughout around 7 years worth of results. The days are non-consecutive so whenever I group by the 'Trainer' variable and run rollapplyr or runSum/ sum_run, I get the past 14 events summarised but can't figure out how to get the 14 days grouped. When I try to define the width or k value using the day from the date, I get an error

invalid time series parameters specified

or vec' must be sorted non-decreasingly and not contain NAs

Edit- Below code gives the error above

df %>% group_by(Trainer) %>% mutate(Fourteen_day_wins =             rollapplyr(Wins, width = 1:n() - findInterval( Date %d-% Days(14), Date), sum)) %>%  ungroup

I'd like to get the total wins and count of events for a 14 day rolling period when grouped by Trainer in new columns in my df. Could someone point me in the right direction please? Still an R novice so has stumped me so far!

Sample df:

structure(list(Trainer = c("Appleby, Charlie", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J",  "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J",  "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J"), Wins = c(1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0,  0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0,  0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0,  1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,  1, 0, 1, 0, 0), Date = structure(c(1508025600, 1508112000, 1508112000,  1508112000, 1508198400, 1508284800, 1508284800, 1508284800, 1508457600,  1508457600, 1508544000, 1508544000, 1508544000, 1508716800, 1508716800,  1508716800, 1508803200, 1508803200, 1508803200, 1508889600, 1508889600,  1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1509062400,  1509062400, 1509062400, 1509062400, 1509062400, 1509148800, 1509148800,  1509148800, 1509148800, 1509148800, 1509148800, 1509321600, 1509321600,  1509321600, 1509321600, 1509494400, 1509667200, 1509667200, 1509753600,  1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600,  1510099200, 1510099200, 1510099200, 1510358400, 1510358400, 1510358400,  1521936000, 1521936000, 1523923200, 1523923200, 1523923200, 1524009600,  1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600,  1524009600, 1524009600, 1524009600, 1524096000, 1524096000, 1524096000,  1524096000, 1524096000, 1524096000, 1524096000, 1524182400, 1524182400,  1524182400, 1524268800, 1524268800, 1524268800, 1524528000, 1524528000,  1524528000, 1524528000, 1524614400, 1524614400, 1524614400, 1524787200,  1524787200, 1524787200, 1524787200, 1524787200, 1525132800, 1525219200,  1525219200, 1525219200), tzone = "UTC", class = c("POSIXct",  "POSIXt"))), row.names = c(NA, -101L), class = c("tbl_df", "tbl",  "data.frame"))

CodePudding user response:

You could use complete to complete your data and then use 14 window period

df %>%
       group_by(Trainer) %>%
       complete(Date = seq(min(Date), max(Date), '1 day')) %>%
       mutate(runMeans = zoo::rollmean(Wins, 14,0,na.rm = TRUE))
    # A tibble: 459 x 4
    # Groups:   Trainer [2]
       Trainer          Date                 Wins runMeans
       <chr>            <dttm>              <dbl>    <dbl>
     1 Appleby, Charlie 2017-10-15 00:00:00     1    0    
     2 Appleby, Charlie 2017-10-16 00:00:00    NA    0    
     3 Appleby, Charlie 2017-10-17 00:00:00    NA    0    
     4 Appleby, Charlie 2017-10-18 00:00:00     1    0    
     5 Appleby, Charlie 2017-10-18 00:00:00     0    0    
     6 Appleby, Charlie 2017-10-19 00:00:00    NA    0    
     7 Appleby, Charlie 2017-10-20 00:00:00    NA    0.429
     8 Appleby, Charlie 2017-10-21 00:00:00    NA    0.429
     9 Appleby, Charlie 2017-10-22 00:00:00    NA    0.429
    10 Appleby, Charlie 2017-10-23 00:00:00     0    0.375

CodePudding user response:

The problem is that the arguments to findInterval should be numeric and ordered.

To address this convert the dates to Date class and then numeric so that d below is the number of days since the Epoch. Now we can use it with findInterval as shown. If the data were already sorted the arrange line could be omitted.

library(dplyr, exclude = c("filter", "lag"))
library(zoo)

DF %>%
  arrange(Trainer, Date) %>%
  group_by(Trainer) %>%
  mutate(d = as.numeric(as.Date(Date)), 
         Wins14 = rollapplyr(Wins, 1:n() - findInterval(d - 14, d), sum)) %>%
  ungroup %>%
  select(-d)

giving:

# A tibble: 101 x 4
   Trainer           Wins Date                Wins14
   <chr>            <dbl> <dttm>               <dbl>
 1 Appleby, Charlie     1 2017-10-15 00:00:00      1
 2 Appleby, Charlie     1 2017-10-18 00:00:00      2
 3 Appleby, Charlie     0 2017-10-18 00:00:00      2
 4 Appleby, Charlie     0 2017-10-23 00:00:00      2
 5 Appleby, Charlie     1 2017-10-25 00:00:00      3
 6 Appleby, Charlie     0 2017-10-25 00:00:00      3
 7 Appleby, Charlie     0 2017-10-25 00:00:00      3
 8 Appleby, Charlie     1 2017-10-25 00:00:00      4
 9 Appleby, Charlie     0 2017-10-27 00:00:00      4
10 Appleby, Charlie     0 2017-10-27 00:00:00      4
# ... with 91 more rows

CodePudding user response:

One option would be to make the combination of all days and all trainers, merge it with the original data and then use a 14-day window:

library(zoo)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
library(tidyverse)
df <- structure(list(Trainer = c("Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J"), Wins = c(1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0), Date = structure(c(1508025600, 1508112000, 1508112000, 1508112000, 1508198400, 1508284800, 1508284800, 1508284800, 1508457600, 1508457600, 1508544000, 1508544000, 1508544000, 1508716800, 1508716800, 1508716800, 1508803200, 1508803200, 1508803200, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1509062400, 1509062400, 1509062400, 1509062400, 1509062400, 1509148800, 1509148800, 1509148800, 1509148800, 1509148800, 1509148800, 1509321600, 1509321600, 1509321600, 1509321600, 1509494400, 1509667200, 1509667200, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1510099200, 1510099200, 1510099200, 1510358400, 1510358400, 1510358400, 1521936000, 1521936000, 1523923200, 1523923200, 1523923200, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524182400, 1524182400, 1524182400, 1524268800, 1524268800, 1524268800, 1524528000, 1524528000, 1524528000, 1524528000, 1524614400, 1524614400, 1524614400, 1524787200, 1524787200, 1524787200, 1524787200, 1524787200, 1525132800, 1525219200, 1525219200, 1525219200), tzone = "UTC", class = c("POSIXct", "POSIXt"))), row.names = c(NA, -101L), class = c("tbl_df", "tbl", "data.frame"))

all_dates <- with(df, expand_grid(Trainer = unique(Trainer), 
                                  Date = seq(min(Date), max(Date), by="1 day")))

all_dates <- left_join(all_dates, df)
#> Joining, by = c("Trainer", "Date")

all_dates %>% 
  group_by(Trainer) %>% 
  mutate(win_pct = rollapplyr(Wins, 
                              width=14, 
                              mean, 
                              partial = TRUE, 
                              align="right", 
                              na.rm=TRUE, 
                              fill=TRUE))
#> # A tibble: 460 × 4
#> # Groups:   Trainer [2]
#>    Trainer          Date                 Wins win_pct
#>    <chr>            <dttm>              <dbl>   <dbl>
#>  1 Appleby, Charlie 2017-10-15 00:00:00     1   1    
#>  2 Appleby, Charlie 2017-10-16 00:00:00    NA   1    
#>  3 Appleby, Charlie 2017-10-17 00:00:00    NA   1    
#>  4 Appleby, Charlie 2017-10-18 00:00:00     1   1    
#>  5 Appleby, Charlie 2017-10-18 00:00:00     0   0.667
#>  6 Appleby, Charlie 2017-10-19 00:00:00    NA   0.667
#>  7 Appleby, Charlie 2017-10-20 00:00:00    NA   0.667
#>  8 Appleby, Charlie 2017-10-21 00:00:00    NA   0.667
#>  9 Appleby, Charlie 2017-10-22 00:00:00    NA   0.667
#> 10 Appleby, Charlie 2017-10-23 00:00:00     0   0.5  
#> # … with 450 more rows

Created on 2022-05-31 by the reprex package (v2.0.1)

  •  Tags:  
  • r zoo
  • Related