Home > Mobile >  R group columns of return trips data
R group columns of return trips data

Time:04-27

I have data of train trips and the number of delayed or cancelled trains that I would like to make the sum.

Start     End        Delayed Cancelled
Paris     Rome       1       0
Brussels  Berlin     4       6
Berlin    Brussels   6       2
Rome      Paris      2       1

How can I group the Start and End columns to make the sum of Paris-Rome and Rome-Paris and Brussels-Berlin and Berlin Brussels to make the total of delayed and cancelled trips by train line ?

CodePudding user response:

Order alphabetically and summarise by group:

df |> 
  group_by(route = if_else(Start < End, paste(Start, End, sep = "-"), paste(End, Start, sep = "-"))) |> 
  summarise(Delayed = sum(Delayed), Cancelled = sum(Cancelled))

#   route           Delayed Cancelled
#   <chr>             <int>     <int>
# 1 Berlin-Brussels      10         8
# 2 Paris-Rome            3         1

Reproducible data:

df = data.frame(
  Start     = c("Paris", "Brussels", "Berlin", "Rome"),
  End       = c("Rome", "Berlin", "Brussels", "Paris"),
  Delayed   = c(1L, 4L, 6L, 2L),
  Cancelled = c(0L, 6L, 2L, 1L)
)

CodePudding user response:

additional solution

tidyverse

library(tidyverse)

df <- data.frame(
  stringsAsFactors = FALSE,
             Start = c("Paris", "Brussels", "Berlin", "Rome"),
               End = c("Rome", "Berlin", "Brussels", "Paris"),
           Delayed = c(1L, 4L, 6L, 2L),
         Cancelled = c(0L, 6L, 2L, 1L)
)

df %>% 
  rowwise() %>% 
  mutate(route = paste0(sort(c_across(c(Start, End))), collapse = "-")) %>% 
  group_by(route) %>% 
  summarise(across(where(is.numeric), sum, na.rm = TRUE))


#> # A tibble: 2 × 3
#>   route           Delayed Cancelled
#>   <chr>             <int>     <int>
#> 1 Berlin-Brussels      10         8
#> 2 Paris-Rome            3         1

Created on 2022-04-26 by the reprex package (v2.0.1)

base

df$route <- apply(df[c("Start", "End")], 1, function(x) paste0(sort(x), collapse = "-"))

aggregate(x = df[c("Delayed", "Cancelled")], by = list(df$route), FUN = sum, na.rm = TRUE)
#>           Group.1 Delayed Cancelled
#> 1 Berlin-Brussels      10         8
#> 2      Paris-Rome       3         1

Created on 2022-04-26 by the reprex package (v2.0.1)

data.table

df$route <- apply(df[c("Start", "End")], 1, function(x) paste0(sort(x), collapse = "-"))

library(data.table)
setDT(df)[, lapply(.SD, sum, na.rm = TRUE), by = route, .SDcols = is.numeric]
#>              route Delayed Cancelled
#> 1:      Paris-Rome       3         1
#> 2: Berlin-Brussels      10         8

Created on 2022-04-26 by the reprex package (v2.0.1)

  • Related