I'm attempting to see if the amount of days rest a team has compared to their opponent has an effect on the outcome of a game. Here is the information I'm starting with:
library(tidyverse)
library(lubridate)
schedule <- data.frame(Home = c('DAL', 'KC', 'DAL', 'OAK'),
Away = c('OAK', 'PHI', 'PHI', 'KC'),
Home_Final = c(30, 21, 28, 14),
Away_Final = c(35, 28, 7, 21),
Date = c('9/1/2015', '9/2/2015', '9/9/2015', '9/9/2015')
)
I'm able to calculate the days off for a team if I filter down to just them, shown here:
schedule <- schedule %>% filter(Home == 'PHI' |
Away == 'PHI')
day_dif = interval(lag(mdy(schedule$Date)),
mdy(schedule$Date))
schedule <- schedule %>%
mutate(days_off = (time_length(day_dif, "days")) - 1)
But what I really need is to get something that looks like this:
Home | Away | Home_Final | Away_Final | Date | Home_Rest | Away_Rest | Adv | Adv_Days | Adv_Won |
---|---|---|---|---|---|---|---|---|---|
DAL | OAK | 30 | 35 | 9/1/2015 | null | null | null | null | null |
KC | PHI | 21 | 28 | 9/2/2015 | null | null | null | null | null |
DAL | PHI | 28 | 7 | 9/9/2015 | 8 | 7 | 1 | 1 | 1 |
OAK | KC | 14 | 21 | 9/9/2015 | 8 | 7 | 1 | 1 | 0 |
'Home_Rest' = The home teams amount of days between their games
'Away Rest' = The away teams amount of days between their games
'Adv' = True/False that there was an advantage on one side
'Adv_Days' = The amount of advantage in days
'Adv_Won' = The side with the advantage won
Which I can't get if I'm filtering down to individual teams and not figuring out a good way to group the teams together. I'm just really lost as to how to take this further than just one team and how to group these properly to look at an entire season.
Any help would be greatly appreciated!
CodePudding user response:
This is a classic example of pivot_longer then pivot_wider.
For comfort and confidence, we will add an index, game_id
.
schedule$game_id <- 1:nrow(schedule)
tschedule <- schedule %>%
mutate(Date=as.Date(Date, format="%m/%d/%Y")) %>%
pivot_longer(cols=c(Home, Away), names_to="Road", values_to="Team") %>%
group_by(Team) %>%
mutate(lagDate = lag(Date)) %>%
mutate(Rest=Date-lagDate)
rest_schedule <- tschedule %>%
pivot_wider(id_cols=c(game_id, Date, Home_Final, Away_Final), names_from=Road, values_from = c(Team, Rest))
Now you can calculate whatever you want using the rest_schedule
.
rest_schedule$Adv <- rest_schedule$Rest_Home != rest_schedule$Rest_Away
rest_schedule
# A tibble: 4 × 9
game_id Date Home_Final Away_Final Team_Home Team_Away Rest_Home Rest_Away Adv
<int> <date> <dbl> <dbl> <chr> <chr> <drtn> <drtn> <lgl>
1 1 2015-09-01 30 35 DAL OAK NA days NA days NA
2 2 2015-09-02 21 28 KC PHI NA days NA days NA
3 3 2015-09-09 28 7 DAL PHI 8 days 7 days TRUE
4 4 2015-09-09 14 21 OAK KC 8 days 7 days TRUE