I'm attempting to keep track of a team's cumulative travel throughout a season, but having trouble trying to figure out the best way to do that since teams change who is home/away throughout the season.
Here is my dataframe:
df <- data.frame(game_id = c('1', '2', '3'),
Home_Abbr = c('PHI', 'DAL', 'PIT'),
Away_Abbr = c('DAL', 'PIT', 'PHI'),
Home_Travel = c('0', '0', '0'),
Away_Travel = c('1000', '500', '750'))
Here is what I would like:
game_id | Home_Abbr | Away_Abbr | Home_Travel | Away_Travel | Home_Cum_Travel | Away_Cum_Travel |
---|---|---|---|---|---|---|
1 | PHI | DAL | 0 | 1000 | 0 | 1000 |
2 | DAL | PIT | 0 | 500 | 1000 | 500 |
3 | PIT | PHI | 0 | 750 | 500 | 750 |
CodePudding user response:
This is not quite what you wanted, but for me part of the problem is the wide format of your data which makes it kind of hard to handle in R.
If it's actually just the computation which matters to you and not the shape of the resulting df I would reshape it to a longer format. In this longer format, every row represents a game of one certain team while the variable status
keeps track if the team is the away or home team.
After reshaping your data, you can just group_by the variable team
and cumsum()
over the travel distance. In the finale dataframe the variable cum_travel
gives the travelled distance of the team after the matchday.
library(tidyverse)
df <- data.frame(game_id = c('1', '2', '3'),
Home_Abbr = c('PHI', 'DAL', 'PIT'),
Away_Abbr = c('DAL', 'PIT', 'PHI'),
Home_Travel = c(0, 0, 0),
Away_Travel = c(1000, 500, 750))
tidyr::pivot_longer(df, cols = c(Home_Abbr, Away_Abbr),
names_to = c("status"),
values_to = "team") %>%
mutate(travel = if_else(status == "Home_Abbr", Home_Travel, Away_Travel)) %>%
select(-Home_Travel) %>%
mutate(status = if_else(status == "Home_Abbr", "home", "away")) %>%
group_by(team) %>%
mutate(cum_travel = cumsum(travel))
#> # A tibble: 6 x 6
#> # Groups: team [3]
#> game_id Away_Travel status team travel cum_travel
#> <fct> <dbl> <chr> <fct> <dbl> <dbl>
#> 1 1 1000 home PHI 0 0
#> 2 1 1000 away DAL 1000 1000
#> 3 2 500 home DAL 0 1000
#> 4 2 500 away PIT 500 500
#> 5 3 750 home PIT 0 500
#> 6 3 750 away PHI 750 750
Created on 2022-05-26 by the reprex package (v0.3.0)
CodePudding user response:
I think you could try using both pivot_longer
and pivot_wider
from tidyverse
. The former will give you long format data making it easier to obtain a cumulative sum of Travel
for each team. The latter puts back to original wide form.
library(tidyverse)
df %>%
pivot_longer(cols = -game_id, names_to = c("location", ".value"), names_sep = "_", values_transform = list(Travel = as.numeric)) %>%
group_by(Abbr) %>%
mutate(Cum_Travel = cumsum(Travel)) %>%
pivot_wider(id_cols = game_id, names_from = location, values_from = c(Abbr, Travel, Cum_Travel))
Output
game_id Abbr_Home Abbr_Away Travel_Home Travel_Away Cum_Travel_Home Cum_Travel_Away
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 PHI DAL 0 1000 0 1000
2 2 DAL PIT 0 500 1000 500
3 3 PIT PHI 0 750 500 750