Home > OS >  Keeping track of cumulative travel using dplyr
Keeping track of cumulative travel using dplyr

Time:05-26

I'm attempting to keep track of a team's cumulative travel throughout a season, but having trouble trying to figure out the best way to do that since teams change who is home/away throughout the season.

Here is my dataframe:

df <- data.frame(game_id = c('1', '2', '3'),
                 Home_Abbr = c('PHI', 'DAL', 'PIT'),
                 Away_Abbr = c('DAL', 'PIT', 'PHI'),
                 Home_Travel = c('0', '0', '0'),
                 Away_Travel = c('1000', '500', '750'))

Here is what I would like:

game_id Home_Abbr Away_Abbr Home_Travel Away_Travel Home_Cum_Travel Away_Cum_Travel
1 PHI DAL 0 1000 0 1000
2 DAL PIT 0 500 1000 500
3 PIT PHI 0 750 500 750

CodePudding user response:

This is not quite what you wanted, but for me part of the problem is the wide format of your data which makes it kind of hard to handle in R.

If it's actually just the computation which matters to you and not the shape of the resulting df I would reshape it to a longer format. In this longer format, every row represents a game of one certain team while the variable status keeps track if the team is the away or home team.

After reshaping your data, you can just group_by the variable team and cumsum() over the travel distance. In the finale dataframe the variable cum_travel gives the travelled distance of the team after the matchday.

library(tidyverse)
df <- data.frame(game_id = c('1', '2', '3'),
                 Home_Abbr = c('PHI', 'DAL', 'PIT'),
                 Away_Abbr = c('DAL', 'PIT', 'PHI'),
                 Home_Travel = c(0, 0, 0),
                 Away_Travel = c(1000, 500, 750))

tidyr::pivot_longer(df, cols = c(Home_Abbr, Away_Abbr), 
                    names_to = c("status"), 
                    values_to = "team") %>%
  mutate(travel = if_else(status == "Home_Abbr", Home_Travel, Away_Travel)) %>% 
  select(-Home_Travel) %>%
  mutate(status = if_else(status == "Home_Abbr", "home", "away")) %>%
  group_by(team) %>% 
  mutate(cum_travel = cumsum(travel))

#> # A tibble: 6 x 6
#> # Groups:   team [3]
#>   game_id Away_Travel status team  travel cum_travel
#>   <fct>         <dbl> <chr>  <fct>  <dbl>      <dbl>
#> 1 1              1000 home   PHI        0          0
#> 2 1              1000 away   DAL     1000       1000
#> 3 2               500 home   DAL        0       1000
#> 4 2               500 away   PIT      500        500
#> 5 3               750 home   PIT        0        500
#> 6 3               750 away   PHI      750        750

Created on 2022-05-26 by the reprex package (v0.3.0)

CodePudding user response:

I think you could try using both pivot_longer and pivot_wider from tidyverse. The former will give you long format data making it easier to obtain a cumulative sum of Travel for each team. The latter puts back to original wide form.

library(tidyverse)

df %>%
  pivot_longer(cols = -game_id, names_to = c("location", ".value"), names_sep = "_", values_transform = list(Travel = as.numeric)) %>%
  group_by(Abbr) %>%
  mutate(Cum_Travel = cumsum(Travel)) %>%
  pivot_wider(id_cols = game_id, names_from = location, values_from = c(Abbr, Travel, Cum_Travel))

Output

  game_id Abbr_Home Abbr_Away Travel_Home Travel_Away Cum_Travel_Home Cum_Travel_Away
  <chr>   <chr>     <chr>           <dbl>       <dbl>           <dbl>           <dbl>
1 1       PHI       DAL                 0        1000               0            1000
2 2       DAL       PIT                 0         500            1000             500
3 3       PIT       PHI                 0         750             500             750
  • Related