How can I create a new column by adding a value from an existing column based on conditions from a different column? Not sure how to clearly explain this so here is an example...
Here is an example of my data:
structure(list(id = c(1002L, 650L, 644L, 608L, 718L, 623L, 721L,
715L, 820L, 616L, 1001L, 634L, 1005L, 821L, 816L, 1021L, 606L,
824L, 626L, 815L), depth = c(10L, 15L, 10L, 15L, 15L, 10L, 10L,
20L, 5L, 20L, 5L, 20L, 20L, 10L, 20L, 10L, 20L, 5L, 20L, 20L),
duration = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 5L,
10L, 5L, 10L, 10L, 10L, 10L, 10L, 10L, 5L, 10L, 10L), catch = c(183L,
1216L, 159L, 232L, 1260L, 35L, 149L, 1047L, 907L, 1123L,
75L, 800L, 517L, 7L, 958L, 178L, 1127L, 16512L, 504L, 7890L
), CPH = c(1098L, 7296L, 954L, 1392L, 7560L, 210L, 894L,
6282L, 10884L, 6738L, 900L, 4800L, 3102L, 42L, 5748L, 1068L,
6762L, 198144L, 3024L, 47340L), distance = c(771.6666667,
771.6666667, 771.6666667, 771.6666667, 771.6666667, 771.6666667,
771.6666667, 771.6666667, 385.8333333, 771.6666667, 385.8333333,
771.6666667, 771.6666667, 771.6666667, 771.6666667, 771.6666667,
771.6666667, 385.8333333, 771.6666667, 771.6666667)), class = "data.frame", row.names = c(NA,
-20L))
I need to create a new column 'totaldist' based on the following conditions from other existing columns:
IF depth == 5, then the new 'totaldist' column should be created using = distance 111
IF depth == 10, then 'totaldist' = distance 130
IF depth == 15, then 'totaldist' = distance 185
IF depth == 20, then 'totaldist' = distance 259
For example, for ID 606 with depth 20 and distance 771.6667, the new column ('totaldist') should show a value of 1030.667 (distance 259).
I am not sure if I should use if_else, or if combined with else if, but the examples I have found using these functions have not worked for me. Should I be trying this with lapply? Not sure how to go about this. I hope this is clear. Thank you!
CodePudding user response:
Since you have multiple criteria, case_when
may be better than ifelse
.
library(dplyr)
df %>% mutate(totaldist = case_when(depth == 5 ~ distance 111,
depth == 10 ~ distance 130,
depth == 15 ~ distance 185,
depth == 20 ~ distance 259,
TRUE ~ NA_real_))
id depth duration catch CPH distance totaldist
1 1002 10 10 183 1098 771.6667 901.6667
2 650 15 10 1216 7296 771.6667 956.6667
3 644 10 10 159 954 771.6667 901.6667
4 608 15 10 232 1392 771.6667 956.6667
5 718 15 10 1260 7560 771.6667 956.6667
6 623 10 10 35 210 771.6667 901.6667
7 721 10 10 149 894 771.6667 901.6667
8 715 20 10 1047 6282 771.6667 1030.6667
9 820 5 5 907 10884 385.8333 496.8333
10 616 20 10 1123 6738 771.6667 1030.6667
11 1001 5 5 75 900 385.8333 496.8333
12 634 20 10 800 4800 771.6667 1030.6667
13 1005 20 10 517 3102 771.6667 1030.6667
14 821 10 10 7 42 771.6667 901.6667
15 816 20 10 958 5748 771.6667 1030.6667
16 1021 10 10 178 1068 771.6667 901.6667
17 606 20 10 1127 6762 771.6667 1030.6667
18 824 5 5 16512 198144 385.8333 496.8333
19 626 20 10 504 3024 771.6667 1030.6667
20 815 20 10 7890 47340 771.6667 1030.6667
CodePudding user response:
Another way is to join with a lookup table:
library(dplyr)
lookup <- data.frame(
depth = c(5, 10, 15, 20),
added_dist = c(111, 130, 185, 259)
)
my_df %>%
left_join(lookup, 'depth') %>%
mutate(totaldist = distance added_dist)
This separates the logic from the coded values, which can be nice.