I have a data set like this
at_ID journey_id flight is_origin is_destination is_outbound
1 1 1 NA NA TRUE
2 1 2 NA NA TRUE
3 1 3 NA NA FALSE
4 1 4 NA NA FALSE
5 2 1 NA NA FALSE
6 3 1 NA NA TRUE
7 3 2 NA NA FALSE
The columns is_origin and is_destination must be filled with TRUE/FALSE with the following conditions:
#first condition
is_origin = TRUE if min(flight) AND is_outbound = TRUE
is_destination = TRUE if max(flight) AND is_outbound =TRUE
#second condition
is_origin = TRUE if min(flight) AND if is_outbound = FALSE
is_destination = TRUE if max(flight) AND if is_outbound = FALSE
The output should look like this:
at_ID journey_id flight is_origin is_destination is_outbound
1 1 1 TRUE FALSE TRUE
2 1 2 FALSE TRUE TRUE
3 1 3 TRUE FALSE FALSE
4 1 4 FALSE TRUE FALSE
5 2 1 TRUE TRUE FALSE
6 3 1 TRUE FALSE TRUE
7 3 2 FALSE TRUE FALSE
Is there an efficient way to do this?
CodePudding user response:
Note that is_origin
in row 5 should be FALSE
.
df %>%
mutate(is_origin = flight == min(flight) & is_outbound == TRUE,
is_destination = flight == max(flight) & is_outbound == FALSE)
at_ID journey_id flight is_origin is_destination is_outbound
1 1 1 1 TRUE FALSE TRUE
2 2 1 2 FALSE FALSE TRUE
3 3 1 3 FALSE FALSE FALSE
4 4 1 4 FALSE TRUE FALSE
5 5 2 1 FALSE FALSE FALSE
6 6 3 1 TRUE FALSE TRUE
7 7 3 2 FALSE FALSE FALSE
CodePudding user response:
If we group by journey_id which seems to be necessary, line 2 (is_destination) and 3 (is_origin) can't be TRUE because they're neither min nor max flight.
library(dplyr)
df %>%
group_by(journey_id) %>%
mutate(is_origin = flight == min(flight) & (is_outbound == T | is_outbound == F),
is_destination = flight == max(flight) & (is_outbound == T | is_outbound == F)) %>%
ungroup()
# A tibble: 7 × 6
at_ID journey_id flight is_origin is_destination is_outbound
<int> <int> <int> <lgl> <lgl> <lgl>
1 1 1 1 TRUE FALSE TRUE
2 2 1 2 FALSE FALSE TRUE
3 3 1 3 FALSE FALSE FALSE
4 4 1 4 FALSE TRUE FALSE
5 5 2 1 TRUE TRUE FALSE
6 6 3 1 TRUE FALSE TRUE
7 7 3 2 FALSE TRUE FALSE