I have a data set like this
ID departure_airport arrival_airport journey id flight is_outbound n
1 ZRH MSY 1 1 TRUE 2
2 MSY IAD 1 2 TRUE 2
3 ZRH LAX 2 1 TRUE 2
4 ZRH BUD 3 1 TRUE 1
5 VIE ZRH 4 1 TRUE 1
6 ZRH KEF 5 1 TRUE 2
7 KEF BOS 5 2 TRUE 2
I would like to duplicate the rows with a value of 2 in column "n" and invert their airport codes in the departure and arrival columns. Further, the values in "is_outbound" of the new created rows are to be put on FALSE and the numbering of the "flight" column should continue for the same journey_id. The output should look like this:
ID departure_airport arrival_airport journey id flight is_outbound n
1 ZRH MSY 1 1 TRUE 2
2 MSY IAD 1 2 TRUE 2
3 IAD MSY 1 3 FALSE 2
4 MSY ZRH 1 4 FALSE 2
5 ZRH LAX 2 1 TRUE 1
6 ZRH BUD 3 1 TRUE 1
7 VIE ZRH 4 1 TRUE 1
8 ZRH KEF 5 1 TRUE 2
9 KEF BOS 5 2 TRUE 2
10 BOS KEF 5 3 FALSE 2
11 KEF ZRH 5 4 FALSE 2
Any suggestions on how I can get the resulting table?
CodePudding user response:
Create a separate table, make the desired changes, bind the two tables by row and rearrange:
Libraries
require(vroom) # only need this to read in a copy/paste'd table
require(dplyr)
Reproducible data
Please use something like dput()
in future so that answerers don't need to do something like this:
flights <-
vroom(
"ID departure_airport arrival_airport journey_id flight is_outbound n
1 ZRH MSY 1 1 TRUE 2
2 MSY IAD 1 2 TRUE 2
3 ZRH LAX 2 1 TRUE 2
4 ZRH BUD 3 1 TRUE 1
5 VIE ZRH 4 1 TRUE 1
6 ZRH KEF 5 1 TRUE 2
7 KEF BOS 5 2 TRUE 2",
delim = " ")
Making new data frame
return_flights <-
flights |>
## Only rows where n is 2
filter(n == 2) |>
## Swap the departure and arrival column names
rename(
departure_airport = "arrival_airport",
arrival_airport = "departure_airport"
) |>
## Set is_outbound to FALSE
mutate(is_outbound = FALSE) |>
group_by(journey_id) |>
mutate(across(everything(), rev)) # flip the flight order across journeys
return_flights
#> # A tibble: 5 × 7
#> # Groups: journey_id [3]
#> ID arrival_airport departure_airport journey_id flight is_outbound n
#> <dbl> <chr> <chr> <dbl> <dbl> <lgl> <dbl>
#> 1 2 MSY IAD 1 2 FALSE 2
#> 2 1 ZRH MSY 1 1 FALSE 2
#> 3 3 ZRH LAX 2 1 FALSE 2
#> 4 7 KEF BOS 5 2 FALSE 2
#> 5 6 ZRH KEF 5 1 FALSE 2
## Bind rows of the two tables and rearrange
flights <-
rbind(flights, return_flights) |>
group_by(journey_id) |>
arrange(journey_id) |>
mutate(flight = row_number()) |> # recreate flight numbers
ungroup() |>
mutate(ID = row_number()) # recreate ID
flights
#> # A tibble: 12 × 7
#> ID departure_airport arrival_airport journey_id flight is_outbound n
#> <int> <chr> <chr> <dbl> <int> <lgl> <dbl>
#> 1 1 ZRH MSY 1 1 TRUE 2
#> 2 2 MSY IAD 1 2 TRUE 2
#> 3 3 IAD MSY 1 3 FALSE 2
#> 4 4 MSY ZRH 1 4 FALSE 2
#> 5 5 ZRH LAX 2 1 TRUE 2
#> 6 6 LAX ZRH 2 2 FALSE 2
#> 7 7 ZRH BUD 3 1 TRUE 1
#> 8 8 VIE ZRH 4 1 TRUE 1
#> 9 9 ZRH KEF 5 1 TRUE 2
#> 10 10 KEF BOS 5 2 TRUE 2
#> 11 11 BOS KEF 5 3 FALSE 2
#> 12 12 KEF ZRH 5 4 FALSE 2
Created on 2022-11-01 with reprex v2.0.2
CodePudding user response:
You can filter the part of the dataframe you want to duplicate, make the changes in the variables, and then add it to the original dataframe:
df_dup = df %>%
filter(n == 2) %>%
group_by(`journey id`) %>% # to get the number of flights in each journey with n() in the last line of mutate
mutate(departure_temp = arrival_airport, #see "Obs 1" below
arrival_airport = departure_airport,
departure_airport = departure_temp,
departure_temp = NULL,
is_outbound = FALSE,
flight = flight n()) #if there are 1 journey, 1 -> 2, if there are 2 journeys, 1 -> 3, 2 -> 4, etc.
df = rbind(df, df_dup) %>%
arrange(`journey id`) %>%
mutate(ID = rownames(.))
> df
# A tibble: 12 x 7
ID departure_airport arrival_airport `journey id` flight is_outbound n
<chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
1 1 ZRH MSY 1 1 TRUE 2
2 2 MSY IAD 1 2 TRUE 2
3 3 MSY ZRH 1 3 FALSE 2
4 4 IAD MSY 1 4 FALSE 2
5 5 ZRH LAX 2 1 TRUE 2
6 6 LAX ZRH 2 2 FALSE 2
7 7 ZRH BUD 3 1 TRUE 1
8 8 VIE ZRH 4 1 TRUE 1
9 9 ZRH KEF 5 1 TRUE 2
10 10 KEF BOS 5 2 TRUE 2
11 11 KEF ZRH 5 3 FALSE 2
12 12 BOS KEF 5 4 FALSE 2
Obs 1: mutate
changes the column in order, so arrival_airport = departure_airport
overrites the info in arrival_airport
, thus we need to save it in a temporary column to pass it later on to departure_airport
. There might be a better way to do this.