Use 'case_when' to assign several new variables at once for a given condition-CodePudding

I have a dataframe df with two variables, df$soil and df$use. I want to add two new variables df$ef1 and df$ef2 to my dataset, based on a condition. I am doing this using 'case_when':

ef1_grassl_mineral <- 0.2
ef1_grassl_peat    <- 0.3
ef1_arable_mineral <- 0.4
ef1_arable_peat    <- 0.5

ef2_grassl_mineral <- 2.3
ef2_grassl_peat    <- 3.4
ef2_arable_mineral <- 4.5
ef2_arable_peat    <- 5.6


df <- data.frame(soil = c('mineral', 'peat', 'mineral', 'peat'),
                 use  = c('grassl', 'arable', 'arable', 'grassl'))

df <- df %>% mutate (
  ef1 = case_when((soil=='mineral' & use=='grassl') ~ ef1_grassl_mineral,
                  (soil=='peat'    & use=='grassl') ~ ef1_grassl_peat,
                  (soil=='mineral' & use=='arable') ~ ef1_arable_mineral,
                  (soil=='peat'    & use=='arable') ~ ef1_arable_peat),
  ef2 = case_when((soil=='mineral' & use=='grassl') ~ ef2_grassl_mineral,
                  (soil=='peat'    & use=='grassl') ~ ef2_grassl_peat,
                  (soil=='mineral' & use=='arable') ~ ef2_arable_mineral,
                  (soil=='peat'    & use=='arable') ~ ef2_arable_peat))

The above works fine, but I have to repeat the conditions for each variable, which makes the code lengthy. Therefore I was wondering whether there is a way to specify the condition only ONCE (e.g., soil=='mineral' & use=='arable') and then define both df$ef1 AND df$ef2. (In Syntax: IF (soil=='mineral' & use=='arable') THEN ef1=ef1_arable_mineral AND ef2=ef2_arable_mineral )

CodePudding user response：

Use a look-up table and a join instead

lookup = tribble(
  ~soil, ~use, ~ef1, ~ef2,
  "mineral", "grassl", 0.2, 2.3,
  "peat", "grassl", 0.3, 3.4,
  "mineral", "arable", 0.4, 4.5,
  "peat", "arable", 0.5, 5.6
)

Then if you have a larger data frame you need to add the ef1 and ef2 columns to based on existing soil and use columns, you do bigger_data %>% left_join(lookup, by = c("soil", "use")).

My favorite feature of look-up tables like this is that they are very easy to audit/debug. If other people need to check the values you can store the look-up table as a flat file (CSV or similar), and it is very clear even to non-technical people.

CodePudding user response：

You can use list() to store values of multiple columns and then pass to tidyr::unnest_wider().

library(tidyverse)

df %>%
  mutate(ef = case_when(
    (soil == 'mineral' & use == 'grassl') ~ list(c(0.2, 2.3)),
    (soil == 'peat'    & use == 'grassl') ~ list(c(0.3, 3.4)),
    (soil == 'mineral' & use == 'arable') ~ list(c(0.4, 4.5)),
    (soil == 'peat'    & use == 'arable') ~ list(c(0.5, 5.6)))
  ) %>%
  unnest_wider(ef, names_sep = '')

# # A tibble: 4 × 4
#   soil    use      ef1   ef2
#   <chr>   <chr>  <dbl> <dbl>
# 1 mineral grassl   0.2   2.3
# 2 peat    arable   0.5   5.6
# 3 mineral arable   0.4   4.5
# 4 peat    grassl   0.3   3.4

CodePudding user response：

Possibly not the neatest solution, but another interesting approach would be to put all the ef1 and ef2 lookup values in a list and call them by concatenating your columns:

library(tidyverse)

ef1 <- ef2 <- list()

ef1$grassl_mineral <- 0.2
ef1$grassl_peat    <- 0.3
ef1$arable_mineral <- 0.4
ef1$arable_peat    <- 0.5

ef2$grassl_mineral <- 2.3
ef2$grassl_peat    <- 3.4
ef2$arable_mineral <- 4.5
ef2$arable_peat    <- 5.6


df <- data.frame(soil = c('mineral', 'peat', 'mineral', 'peat'),
                 use  = c('grassl', 'arable', 'arable', 'grassl'))

df |> 
  mutate(ef1 = ef1[paste(use, soil, sep = "_")],
         ef2 = ef2[paste(use, soil, sep = "_")])
#      soil    use ef1 ef2
# 1 mineral grassl 0.2 2.3
# 2    peat arable 0.5 5.6
# 3 mineral arable 0.4 4.5
# 4    peat grassl 0.3 3.4