Home > Back-end >  Comparing dates between games to find possible advantages
Comparing dates between games to find possible advantages

Time:05-16

I'm attempting to see if the amount of days rest a team has compared to their opponent has an effect on the outcome of a game. Here is the information I'm starting with:

library(tidyverse)
library(lubridate)

schedule <- data.frame(Home = c('DAL', 'KC', 'DAL', 'OAK'),
                       Away = c('OAK', 'PHI', 'PHI', 'KC'),
                       Home_Final = c(30, 21, 28, 14),
                       Away_Final = c(35, 28, 7, 21),
                       Date = c('9/1/2015', '9/2/2015', '9/9/2015', '9/9/2015')
                       )

I'm able to calculate the days off for a team if I filter down to just them, shown here:

schedule <- schedule %>% filter(Home == 'PHI' |
                                     Away == 'PHI')

day_dif = interval(lag(mdy(schedule$Date)),
                   mdy(schedule$Date))

schedule <- schedule %>%
    mutate(days_off = (time_length(day_dif, "days")) - 1)

But what I really need is to get something that looks like this:

Home Away Home_Final Away_Final Date Home_Rest Away_Rest Adv Adv_Days Adv_Won
DAL OAK 30 35 9/1/2015 null null null null null
KC PHI 21 28 9/2/2015 null null null null null
DAL PHI 28 7 9/9/2015 8 7 1 1 1
OAK KC 14 21 9/9/2015 8 7 1 1 0

'Home_Rest' = The home teams amount of days between their games

'Away Rest' = The away teams amount of days between their games

'Adv' = True/False that there was an advantage on one side

'Adv_Days' = The amount of advantage in days

'Adv_Won' = The side with the advantage won

Which I can't get if I'm filtering down to individual teams and not figuring out a good way to group the teams together. I'm just really lost as to how to take this further than just one team and how to group these properly to look at an entire season.

Any help would be greatly appreciated!

CodePudding user response:

This is a classic example of pivot_longer then pivot_wider. For comfort and confidence, we will add an index, game_id.

schedule$game_id <- 1:nrow(schedule)

tschedule <- schedule %>% 
  mutate(Date=as.Date(Date, format="%m/%d/%Y")) %>%
  pivot_longer(cols=c(Home, Away), names_to="Road", values_to="Team") %>%
  group_by(Team) %>% 
  mutate(lagDate = lag(Date)) %>% 
  mutate(Rest=Date-lagDate)

rest_schedule <- tschedule %>% 
  pivot_wider(id_cols=c(game_id, Date, Home_Final, Away_Final), names_from=Road, values_from = c(Team, Rest))

Now you can calculate whatever you want using the rest_schedule.

rest_schedule$Adv <- rest_schedule$Rest_Home != rest_schedule$Rest_Away
rest_schedule
# A tibble: 4 × 9
  game_id Date       Home_Final Away_Final Team_Home Team_Away Rest_Home Rest_Away Adv  
    <int> <date>          <dbl>      <dbl> <chr>     <chr>     <drtn>    <drtn>    <lgl>
1       1 2015-09-01         30         35 DAL       OAK       NA days   NA days   NA   
2       2 2015-09-02         21         28 KC        PHI       NA days   NA days   NA   
3       3 2015-09-09         28          7 DAL       PHI        8 days    7 days   TRUE 
4       4 2015-09-09         14         21 OAK       KC         8 days    7 days   TRUE 
  • Related