Home > database >  R: How to duplicate specific rows based on one column and invert values in a data frame
R: How to duplicate specific rows based on one column and invert values in a data frame

Time:10-25

I have a data set like this

ID     departure_airport   arrival_airport   n
1      ZRH                 BUD               1
2      ZRH                 MSY               2
3      MSY                 IAD               2
4      VIE                 ZRH               1

I would like to duplicate the rows with a value of 2 in column "n" and invert their airport codes in the departure and arrival columns. The output should look like this:

ID     departure_airport   arrival_airport   n
1      ZRH                 BUD               1
2      ZRH                 MSY               2
3      MSY                 ZRH               2
4      MSY                 IAD               2
5      IAD                 MSY               2
6      VIE                 ZRH               1

Any suggestions on how I can get the resulting table? Thanks in advance.

CodePudding user response:

You can filter all rows where n equals 2 and rbind it to your data.frame in a different order:

rbind(df1, df1[df1$n == 2, c(1, 3, 2, 4)])

CodePudding user response:

We may use uncount

library(dplyr)
library(tidyr)
df1 %>%
   uncount(n, .remove = FALSE) %>%
   group_by(ID) %>%
    mutate(departure_airport = replace(departure_airport, 
      duplicated(departure_airport), first(arrival_airport)), 
    arrival_airport = replace(arrival_airport, duplicated(arrival_airport), first(departure_airport))) %>% 
    ungroup %>% 
   mutate(ID = row_number())

-output

# A tibble: 6 × 4
     ID departure_airport arrival_airport     n
  <int> <chr>             <chr>           <int>
1     1 ZRH               BUD                 1
2     2 ZRH               MSY                 2
3     3 MSY               ZRH                 2
4     4 MSY               IAD                 2
5     5 IAD               MSY                 2
6     6 VIE               ZRH                 1

data

df1 <- structure(list(ID = 1:4, departure_airport = c("ZRH", "ZRH", 
"MSY", "VIE"), arrival_airport = c("BUD", "MSY", "IAD", "ZRH"
), n = c(1L, 2L, 2L, 1L)), class = "data.frame", 
row.names = c(NA, 
-4L))

CodePudding user response:

The other answers are good - I think dplyr offers several readable methods for doing this, but in this case I'd do:

library(dplyr)

dat |>
    filter(n == 2) |>
    rename(
        departure_airport = arrival_airport,
        arrival_airport = departure_airport
    ) |>
    bind_rows(dat) |>
    select(names(dat)) |>
    arrange(ID)

#   ID departure_airport arrival_airport n
# 1  1               ZRH             BUD 1
# 2  2               MSY             ZRH 2
# 3  2               ZRH             MSY 2
# 4  3               IAD             MSY 2
# 5  3               MSY             IAD 2
# 6  4               VIE             ZRH 1

Assumes your data is called dat obviously.

  • Related