My question seems similar to old ones, but I could not find an answer to my specific problem from those, so I will ask my question aswell:
I have a dataframe in R:
AT_ID <- c(1,2,3)
DEPARTURE_AIRPORT <- c("ZRH","ZRH","ZRH")
STOPOVER_1 <- c(NA, "BEL", "DUB")
STOPOVER_2 <- c(NA, "RUO", NA)
ARRIVAL_AIRPORT <- c("IAD", "LAX","BUD")
intinerary_id <- c(NA,NA,NA)
test_df <- data.frame(AT_ID, DEPARTURE_AIRPORT, STOPOVER_1, STOPOVER_2, ARRIVAL_AIRPORT, intinerary_id)
print (test_df)
This data frame should be split up so that the flights are separate for each segment. So for flight AT_ID 1 the flight segments would be itinerary_id = 1 from ZRH to IAD, AT_ID 2 would the segment 1 from ZRH to BEL, segment 2 from BEL to RUO and segment 3 from RUO to LAX, each segment in a different row but with the same intinerary_id.
The result would look something like that:
AT_ID <- c(1,2,3,4,5,6)
DEPARTURE_AIRPORT <- c("ZRH","ZRH","BEL","RUO","ZRH","DUB")
ARRIVAL_AIRPORT <- c("IAD", "BEL","RUO", "LAX","DUB","BUD")
intinerary_id <- c(1,2,2,2,3,3)
test_df_target <- data.frame(AT_ID, DEPARTURE_AIRPORT, ARRIVAL_AIRPORT, intinerary_id)
print(test_df_target)
The split()- function and separate_columns did not work out for me as the columns are not all the same, some have NA and some 3 segments.
I hope my question is clear, otherwise please let me know, so I can specify.
Thank you in advance!
CodePudding user response:
We could use pivot_longer
like this:
library(dplyr)
library(tidyr)
test_df %>%
pivot_longer(
cols =c(DEPARTURE_AIRPORT, STOPOVER_1, STOPOVER_2),
names_to = "name",
values_to = "DEPARTURE_AIRPORT"
) %>%
filter(!is.na(DEPARTURE_AIRPORT)) %>%
mutate(intinerary_id = AT_ID,
AT_ID = row_number())%>%
select(AT_ID, DEPARTURE_AIRPORT, ARRIVAL_AIRPORT, intinerary_id)
AT_ID DEPARTURE_AIRPORT ARRIVAL_AIRPORT intinerary_id
<int> <chr> <chr> <dbl>
1 1 ZRH IAD 1
2 2 ZRH LAX 2
3 3 BEL LAX 2
4 4 RUO LAX 2
5 5 ZRH BUD 3
6 6 DUB BUD 3