Home > front end >  R: Insert values in new column respecting two conditions that affect other columns
R: Insert values in new column respecting two conditions that affect other columns

Time:11-09

I have a data set like this

at_ID   journey_id  flight  is_origin  is_destination  is_outbound
1       1           1       NA         NA              TRUE
2       1           2       NA         NA              TRUE
3       1           3       NA         NA              FALSE
4       1           4       NA         NA              FALSE
5       2           1       NA         NA              FALSE
6       3           1       NA         NA              TRUE
7       3           2       NA         NA              FALSE

The columns is_origin and is_destination must be filled with TRUE/FALSE with the following conditions:

#first condition
    is_origin = TRUE if min(flight) AND is_outbound = TRUE 
    is_destination = TRUE if max(flight) AND is_outbound =TRUE
#second condition
    is_origin = TRUE if min(flight) AND if is_outbound = FALSE
    is_destination = TRUE if max(flight) AND if is_outbound = FALSE

The output should look like this:

at_ID   journey_id  flight    is_origin  is_destination  is_outbound
    1       1           1     TRUE       FALSE           TRUE
    2       1           2     FALSE      TRUE            TRUE
    3       1           3     TRUE       FALSE           FALSE
    4       1           4     FALSE      TRUE            FALSE
    5       2           1     TRUE       TRUE            FALSE
    6       3           1     TRUE       FALSE           TRUE
    7       3           2     FALSE      TRUE            FALSE

Is there an efficient way to do this?

CodePudding user response:

Note that is_origin in row 5 should be FALSE.

df %>% 
  mutate(is_origin = flight == min(flight) & is_outbound == TRUE,
         is_destination = flight == max(flight) & is_outbound == FALSE)

  at_ID journey_id flight is_origin is_destination is_outbound
1     1          1      1      TRUE          FALSE        TRUE
2     2          1      2     FALSE          FALSE        TRUE
3     3          1      3     FALSE          FALSE       FALSE
4     4          1      4     FALSE           TRUE       FALSE
5     5          2      1     FALSE          FALSE       FALSE
6     6          3      1      TRUE          FALSE        TRUE
7     7          3      2     FALSE          FALSE       FALSE

CodePudding user response:

If we group by journey_id which seems to be necessary, line 2 (is_destination) and 3 (is_origin) can't be TRUE because they're neither min nor max flight.

library(dplyr)

df %>% 
  group_by(journey_id) %>% 
  mutate(is_origin = flight == min(flight) & (is_outbound == T | is_outbound == F), 
         is_destination = flight == max(flight) & (is_outbound == T | is_outbound == F)) %>% 
  ungroup()
# A tibble: 7 × 6
  at_ID journey_id flight is_origin is_destination is_outbound
  <int>      <int>  <int> <lgl>     <lgl>          <lgl>
1     1          1      1 TRUE      FALSE          TRUE
2     2          1      2 FALSE     FALSE          TRUE
3     3          1      3 FALSE     FALSE          FALSE
4     4          1      4 FALSE     TRUE           FALSE
5     5          2      1 TRUE      TRUE           FALSE
6     6          3      1 TRUE      FALSE          TRUE
7     7          3      2 FALSE     TRUE           FALSE
  • Related