I have a dataframe df
with two variables, df$soil
and df$use
. I want to add two new variables df$ef1
and df$ef2
to my dataset, based on a condition. I am doing this using 'case_when':
ef1_grassl_mineral <- 0.2
ef1_grassl_peat <- 0.3
ef1_arable_mineral <- 0.4
ef1_arable_peat <- 0.5
ef2_grassl_mineral <- 2.3
ef2_grassl_peat <- 3.4
ef2_arable_mineral <- 4.5
ef2_arable_peat <- 5.6
df <- data.frame(soil = c('mineral', 'peat', 'mineral', 'peat'),
use = c('grassl', 'arable', 'arable', 'grassl'))
df <- df %>% mutate (
ef1 = case_when((soil=='mineral' & use=='grassl') ~ ef1_grassl_mineral,
(soil=='peat' & use=='grassl') ~ ef1_grassl_peat,
(soil=='mineral' & use=='arable') ~ ef1_arable_mineral,
(soil=='peat' & use=='arable') ~ ef1_arable_peat),
ef2 = case_when((soil=='mineral' & use=='grassl') ~ ef2_grassl_mineral,
(soil=='peat' & use=='grassl') ~ ef2_grassl_peat,
(soil=='mineral' & use=='arable') ~ ef2_arable_mineral,
(soil=='peat' & use=='arable') ~ ef2_arable_peat))
The above works fine, but I have to repeat the conditions for each variable, which makes the code lengthy.
Therefore I was wondering whether there is a way to specify the condition only ONCE (e.g., soil=='mineral' & use=='arable')
and then define both df$ef1
AND df$ef2
. (In Syntax: IF (soil=='mineral' & use=='arable') THEN ef1=ef1_arable_mineral AND ef2=ef2_arable_mineral )
CodePudding user response:
Use a look-up table and a join instead
lookup = tribble(
~soil, ~use, ~ef1, ~ef2,
"mineral", "grassl", 0.2, 2.3,
"peat", "grassl", 0.3, 3.4,
"mineral", "arable", 0.4, 4.5,
"peat", "arable", 0.5, 5.6
)
Then if you have a larger data frame you need to add the ef1
and ef2
columns to based on existing soil
and use
columns, you do bigger_data %>% left_join(lookup, by = c("soil", "use"))
.
My favorite feature of look-up tables like this is that they are very easy to audit/debug. If other people need to check the values you can store the look-up table as a flat file (CSV or similar), and it is very clear even to non-technical people.
CodePudding user response:
You can use list()
to store values of multiple columns and then pass to tidyr::unnest_wider()
.
library(tidyverse)
df %>%
mutate(ef = case_when(
(soil == 'mineral' & use == 'grassl') ~ list(c(0.2, 2.3)),
(soil == 'peat' & use == 'grassl') ~ list(c(0.3, 3.4)),
(soil == 'mineral' & use == 'arable') ~ list(c(0.4, 4.5)),
(soil == 'peat' & use == 'arable') ~ list(c(0.5, 5.6)))
) %>%
unnest_wider(ef, names_sep = '')
# # A tibble: 4 × 4
# soil use ef1 ef2
# <chr> <chr> <dbl> <dbl>
# 1 mineral grassl 0.2 2.3
# 2 peat arable 0.5 5.6
# 3 mineral arable 0.4 4.5
# 4 peat grassl 0.3 3.4
CodePudding user response:
Possibly not the neatest solution, but another interesting approach would be to put all the ef1
and ef2
lookup values in a list and call them by concatenating your columns:
library(tidyverse)
ef1 <- ef2 <- list()
ef1$grassl_mineral <- 0.2
ef1$grassl_peat <- 0.3
ef1$arable_mineral <- 0.4
ef1$arable_peat <- 0.5
ef2$grassl_mineral <- 2.3
ef2$grassl_peat <- 3.4
ef2$arable_mineral <- 4.5
ef2$arable_peat <- 5.6
df <- data.frame(soil = c('mineral', 'peat', 'mineral', 'peat'),
use = c('grassl', 'arable', 'arable', 'grassl'))
df |>
mutate(ef1 = ef1[paste(use, soil, sep = "_")],
ef2 = ef2[paste(use, soil, sep = "_")])
# soil use ef1 ef2
# 1 mineral grassl 0.2 2.3
# 2 peat arable 0.5 5.6
# 3 mineral arable 0.4 4.5
# 4 peat grassl 0.3 3.4