Home > other >  Duplication of specific rows based on a value of a column. In addition, continue and adjust the numb
Duplication of specific rows based on a value of a column. In addition, continue and adjust the numb

Time:11-01

I have a data set like this

ID     departure_airport   arrival_airport   journey id  flight  is_outbound n
1      ZRH                 MSY               1           1       TRUE        2
2      MSY                 IAD               1           2       TRUE        2
3      ZRH                 LAX               2           1       TRUE        2
4      ZRH                 BUD               3           1       TRUE        1 
5      VIE                 ZRH               4           1       TRUE        1
6      ZRH                 KEF               5           1       TRUE        2
7      KEF                 BOS               5           2       TRUE        2

I would like to duplicate the rows with a value of 2 in column "n" and invert their airport codes in the departure and arrival columns. Further, the values in "is_outbound" of the new created rows are to be put on FALSE and the numbering of the "flight" column should continue for the same journey_id. The output should look like this:

ID     departure_airport   arrival_airport   journey id  flight  is_outbound n
1      ZRH                 MSY               1           1       TRUE        2
2      MSY                 IAD               1           2       TRUE        2
3      IAD                 MSY               1           3       FALSE       2
4      MSY                 ZRH               1           4       FALSE       2
5      ZRH                 LAX               2           1       TRUE        1
6      ZRH                 BUD               3           1       TRUE        1
7      VIE                 ZRH               4           1       TRUE        1
8      ZRH                 KEF               5           1       TRUE        2
9      KEF                 BOS               5           2       TRUE        2
10     BOS                 KEF               5           3       FALSE       2
11     KEF                 ZRH               5           4       FALSE       2

Any suggestions on how I can get the resulting table?

CodePudding user response:

Create a separate table, make the desired changes, bind the two tables by row and rearrange:

Libraries

require(vroom) # only need this to read in a copy/paste'd table
require(dplyr)

Reproducible data

Please use something like dput() in future so that answerers don't need to do something like this:

flights <- 
  vroom(
"ID departure_airport arrival_airport journey_id flight is_outbound n
1 ZRH MSY 1 1 TRUE 2
2 MSY IAD 1 2 TRUE 2
3 ZRH LAX 2 1 TRUE 2
4 ZRH BUD 3 1 TRUE 1
5 VIE ZRH 4 1 TRUE 1
6 ZRH KEF 5 1 TRUE 2
7 KEF BOS 5 2 TRUE 2",
delim = " ")

Making new data frame

return_flights <- 
  flights |> 
  ## Only rows where n is 2
  filter(n == 2) |>   
  ## Swap the departure and arrival column names
  rename(
    departure_airport = "arrival_airport",
    arrival_airport   = "departure_airport"
    ) |>
  ## Set is_outbound to FALSE
  mutate(is_outbound = FALSE) |> 
  group_by(journey_id) |> 
  mutate(across(everything(), rev))   # flip the flight order across journeys

return_flights
#> # A tibble: 5 × 7
#> # Groups:   journey_id [3]
#>      ID arrival_airport departure_airport journey_id flight is_outbound     n
#>   <dbl> <chr>           <chr>                  <dbl>  <dbl> <lgl>       <dbl>
#> 1     2 MSY             IAD                        1      2 FALSE           2
#> 2     1 ZRH             MSY                        1      1 FALSE           2
#> 3     3 ZRH             LAX                        2      1 FALSE           2
#> 4     7 KEF             BOS                        5      2 FALSE           2
#> 5     6 ZRH             KEF                        5      1 FALSE           2

## Bind rows of the two tables and rearrange
flights <-
  rbind(flights, return_flights) |>
  group_by(journey_id) |>
  arrange(journey_id)  |>
  mutate(flight = row_number()) |> # recreate flight numbers
  ungroup() |>
  mutate(ID = row_number())     # recreate ID

flights
#> # A tibble: 12 × 7
#>       ID departure_airport arrival_airport journey_id flight is_outbound     n
#>    <int> <chr>             <chr>                <dbl>  <int> <lgl>       <dbl>
#>  1     1 ZRH               MSY                      1      1 TRUE            2
#>  2     2 MSY               IAD                      1      2 TRUE            2
#>  3     3 IAD               MSY                      1      3 FALSE           2
#>  4     4 MSY               ZRH                      1      4 FALSE           2
#>  5     5 ZRH               LAX                      2      1 TRUE            2
#>  6     6 LAX               ZRH                      2      2 FALSE           2
#>  7     7 ZRH               BUD                      3      1 TRUE            1
#>  8     8 VIE               ZRH                      4      1 TRUE            1
#>  9     9 ZRH               KEF                      5      1 TRUE            2
#> 10    10 KEF               BOS                      5      2 TRUE            2
#> 11    11 BOS               KEF                      5      3 FALSE           2
#> 12    12 KEF               ZRH                      5      4 FALSE           2

Created on 2022-11-01 with reprex v2.0.2

CodePudding user response:

You can filter the part of the dataframe you want to duplicate, make the changes in the variables, and then add it to the original dataframe:

df_dup = df %>%
  filter(n == 2) %>%
  group_by(`journey id`) %>% # to get the number of flights in each journey with n() in the last line of mutate
  mutate(departure_temp = arrival_airport, #see "Obs 1" below 
         arrival_airport = departure_airport,
         departure_airport = departure_temp,
         departure_temp = NULL,
         is_outbound = FALSE,
         flight = flight   n()) #if there are 1 journey, 1 -> 2, if there are 2 journeys, 1 -> 3, 2 -> 4, etc.

df = rbind(df, df_dup) %>%
  arrange(`journey id`) %>%
  mutate(ID = rownames(.))

> df
# A tibble: 12 x 7
ID    departure_airport arrival_airport `journey id` flight is_outbound     n
<chr> <chr>             <chr>                  <dbl>  <dbl> <chr>       <dbl>
1 1     ZRH               MSY                        1      1 TRUE            2
2 2     MSY               IAD                        1      2 TRUE            2
3 3     MSY               ZRH                        1      3 FALSE           2
4 4     IAD               MSY                        1      4 FALSE           2
5 5     ZRH               LAX                        2      1 TRUE            2
6 6     LAX               ZRH                        2      2 FALSE           2
7 7     ZRH               BUD                        3      1 TRUE            1
8 8     VIE               ZRH                        4      1 TRUE            1
9 9     ZRH               KEF                        5      1 TRUE            2
10 10    KEF               BOS                        5      2 TRUE            2
11 11    KEF               ZRH                        5      3 FALSE           2
12 12    BOS               KEF                        5      4 FALSE           2

Obs 1: mutate changes the column in order, so arrival_airport = departure_airport overrites the info in arrival_airport, thus we need to save it in a temporary column to pass it later on to departure_airport. There might be a better way to do this.

  • Related