Home > Software design >  Simplifying case_when() when condition is the same across multiple variables
Simplifying case_when() when condition is the same across multiple variables

Time:11-04

Suppose I want to make a new variable based on conditions within multiple other variables, and the condition is the same across each of these variables. I know I could use case_when(), but I'm curious to see if this can be simplified if my conditional phrase is the same for each conditional variable. I also want to know if this can be easily replicated to create multiple variables.

Example: A teacher has 3 students who have received grades for 3 tests and 3 quizzes. He wants to create a variable that says whether or not a student ever had a score of <70 on any test or quiz. So he will create two new variables as so:


ID <- c("Dave", "Joe", "Steve")
exam1 <- c(80, 100, 90)
exam2 <- c(30, 90, 88)
exam3 <- c(90, 65, 95)
quiz1 <- c(90, 90, 20)
quiz2 <- c(33, 100, 100)
quiz3 <- c(90, 90, 50)

data <- tibble(ID, exam1, exam2, exam3, quiz1, quiz2, quiz3)

data <- data %>% 
  mutate(
    fail_exam = case_when(
      exam1 < 70 ~ 1,
      exam2 < 70 ~ 1,
      exam3 < 70 ~ 1,
      T ~ 0
    ),
    fail_quiz = case_when(
      quiz1 < 70 ~ 1,
      quiz2 < 70 ~ 1,
      quiz3 < 70 ~ 1,
      T ~ 0
    )
  )

He ends up with the following output with his two new variables:

# A tibble: 3 × 9
  ID    exam1 exam2 exam3 quiz1 quiz2 quiz3 fail_exam fail_quiz
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
1 Dave     80    30    90    90    33    90         1         1
2 Joe     100    90    65    90   100    90         1         0
3 Steve    90    88    95    20   100    50         0         1

Now for the sake of this example, suppose you have 100 examination categories (e.g., mid-term, final exam, homework, etc.) for which students received grades, and you want create a new variable for each one of them indicating whether or not they ever had a failing score on it. One could iteratively go through each examination category as I did above with exam and quiz using case_when(), but I'd like to know if there is a simpler way to apply a single condition (i.e., if numeric score <70) to a list of examination categories (example: c("exam", "quiz", "homework", "midterm") that follow the numbering convention I have above in order to create unique output variables such as "fail_exam" and "fail_quiz" for each one of them.

This isn't mission critical, but looking to simplify things a bit.

Thx, C

CodePudding user response:

You can use dplyr::if_any() to test multiple variables against a predicate function:

library(dplyr)

data %>% 
  mutate(
    fail_exam = as.numeric(if_any(exam1:exam3, ~ .x < 70)),
    fail_quiz = as.numeric(if_any(quiz1:quiz3, ~ .x < 70))
  )
# A tibble: 3 × 9
  ID    exam1 exam2 exam3 quiz1 quiz2 quiz3 fail_exam fail_quiz
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
1 Dave     80    30    90    90    33    90         1         1
2 Joe     100    90    65    90   100    90         1         0
3 Steve    90    88    95    20   100    50         0         1

PS - also see dplyr::if_all().

Edit: A solution to do the same across an arbitrary number of "quiz" / "exam" categories. This creates a separate summary table of failures by ID and type, which you can then merge back into your original dataframe.

library(dplyr)
library(tidyr)

failures <- data %>% 
  pivot_longer(
    !ID, 
    names_to = c("type", "number"),
    names_pattern = "^(\\w )(\\d )$"
  ) %>% 
  group_by(ID, type) %>%
  summarize(
    fail = as.numeric(any(value < 70)),
    .groups = "drop"
  ) %>%
  ungroup() %>%
  pivot_wider(
    names_from = type,
    names_glue = "fail_{type}",
    values_from = fail
  )

data %>% 
  left_join(failures)

CodePudding user response:

Here is a generalized way using map_dfc and if_any.

library(dplyr)
cols <- c("exam", "quiz")
data %>% 
  mutate(map_dfc(cols, ~ transmute(data, "fail_{.x}" :=  if_any(starts_with(.x), `<`, 70))))

output

# A tibble: 3 × 9
  ID    exam1 exam2 exam3 quiz1 quiz2 quiz3 fail_exam fail_quiz
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <int>     <int>
1 Dave     80    30    90    90    33    90         1         1
2 Joe     100    90    65    90   100    90         1         0
3 Steve    90    88    95    20   100    50         0         1
  • Related