In a large dataframe ("bind") with three columns I have to add a fourth column with values conditionally based on the first three columns.
It's about travel. A person (id) has several options to travel (e.g. bus, car, train, ...), that is always quite different. Sometimes a person has several options by train to travel from A to B for example.
My dataframe looks like this:
main_mode | duration | id |
---|---|---|
train | 1250 | 1 |
train | 900 | 1 |
car | 540 | 1 |
train | 650 | 2 |
car | 450 | 2 |
plane | 350 | 2 |
train | 350 | 3 |
car | 890 | 3 |
Now I want to create a new column "train_avail". If the train journey takes twice as long as the car journey, then I would like to write train_avail = 0 in a new column otherwise write 1.
What i would like to have:
main_mode | duration | id | train_avail |
---|---|---|---|
train | 1250 | 1 | 0 |
train | 900 | 1 | 1 |
car | 540 | 1 | NA |
train | 650 | 2 | 1 |
car | 450 | 2 | NA |
plane | 350 | 2 | NA |
train | 350 | 3 | 1 |
car | 890 | 3 | NA |
This is, what I tried, but obviously doesn't work. In my complete cluelessness, I have no idea how to write the code.
bind %>%
group_by(id) %>%
mutate(train_avail = ifelse((bind$main_mode = train) & (duration = 2 * (bind$main_mode = car)) ~ '0', '1'))
I'm sorry, I have no idea at all how this should work. I've only been working with R for a few days.
Thanks for your help!!
CodePudding user response:
df |>
group_by(id) |>
mutate(train_avail = {
car <- min(duration[main_mode == "car"])
if_else(main_mode == "train",
if_else(duration > 2* car, 0, 1),
NA_real_)
})
##> # A tibble: 8 × 4
##> # Groups: id [3]
##> main_mode duration id train_avail
##> <chr> <int> <int> <dbl>
##> 1 train 1250 1 0
##> 2 train 900 1 1
##> 3 car 540 1 NA
##> 4 train 650 2 1
##> 5 car 450 2 NA
##> 6 plane 350 2 NA
##> 7 train 350 3 1
##> 8 car 890 3 NA