I have a data frame containing football matches with columns for:
- Matchday (date)
- Home Team
- Away Team
I want to calculate how long it has been since a team last played. That means I have to find the latest appearance of Team and get the date, i.e. Matchday.
I really have no idea how to do it, except maybe converting the structure to a long format. But as for matching based on team and grabbing the matchday I am lost.
Current structure:
Matchday | Team | Home_Away |
---|---|---|
2022-08-05 | Arsenal FC | Away |
2022-08-05 | Crystal Palace | Home |
... | ... | ... |
2022-08-13 | Arsenal FC | Home |
2022-08-13 | Leicester City | Away |
So I want a column indicating rest days basically:
Matchday | Team | Home_Away | Rest_Days |
---|---|---|---|
2022-08-05 | Arsenal FC | Away | NA |
2022-08-05 | Crystal Palace | Home | NA |
... | ... | ... | ... |
2022-08-07 | Leicester City | Home | NA |
... | ... | ... | ... |
2022-08-13 | Arsenal FC | Home | 8 |
2022-08-13 | Leicester City | Away | 6 |
CodePudding user response:
You need to sort the dataframe by team and date (date in ascending order). Then you create a column with the lagged value of date within the Team. For instance with dplyr:
library(dplyr)
data <-
data %>%
group_by(Team) %>%
mutate(rest_days = Matchday - dplyr::lag(Matchday, n = 1, default = NA))
I could not check if that works exactly in that manner, but the idea (sort, group and then lag) should work..