Suppose I want to make a new variable based on conditions within multiple other variables, and the condition is the same across each of these variables. I know I could use case_when(), but I'm curious to see if this can be simplified if my conditional phrase is the same for each conditional variable. I also want to know if this can be easily replicated to create multiple variables.
Example: A teacher has 3 students who have received grades for 3 tests and 3 quizzes. He wants to create a variable that says whether or not a student ever had a score of <70 on any test or quiz. So he will create two new variables as so:
ID <- c("Dave", "Joe", "Steve")
exam1 <- c(80, 100, 90)
exam2 <- c(30, 90, 88)
exam3 <- c(90, 65, 95)
quiz1 <- c(90, 90, 20)
quiz2 <- c(33, 100, 100)
quiz3 <- c(90, 90, 50)
data <- tibble(ID, exam1, exam2, exam3, quiz1, quiz2, quiz3)
data <- data %>%
mutate(
fail_exam = case_when(
exam1 < 70 ~ 1,
exam2 < 70 ~ 1,
exam3 < 70 ~ 1,
T ~ 0
),
fail_quiz = case_when(
quiz1 < 70 ~ 1,
quiz2 < 70 ~ 1,
quiz3 < 70 ~ 1,
T ~ 0
)
)
He ends up with the following output with his two new variables:
# A tibble: 3 × 9
ID exam1 exam2 exam3 quiz1 quiz2 quiz3 fail_exam fail_quiz
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Dave 80 30 90 90 33 90 1 1
2 Joe 100 90 65 90 100 90 1 0
3 Steve 90 88 95 20 100 50 0 1
Now for the sake of this example, suppose you have 100 examination categories (e.g., mid-term, final exam, homework, etc.) for which students received grades, and you want create a new variable for each one of them indicating whether or not they ever had a failing score on it. One could iteratively go through each examination category as I did above with exam and quiz using case_when(), but I'd like to know if there is a simpler way to apply a single condition (i.e., if numeric score <70) to a list of examination categories (example: c("exam", "quiz", "homework", "midterm") that follow the numbering convention I have above in order to create unique output variables such as "fail_exam" and "fail_quiz" for each one of them.
This isn't mission critical, but looking to simplify things a bit.
Thx, C
CodePudding user response:
You can use dplyr::if_any()
to test multiple variables against a predicate function:
library(dplyr)
data %>%
mutate(
fail_exam = as.numeric(if_any(exam1:exam3, ~ .x < 70)),
fail_quiz = as.numeric(if_any(quiz1:quiz3, ~ .x < 70))
)
# A tibble: 3 × 9
ID exam1 exam2 exam3 quiz1 quiz2 quiz3 fail_exam fail_quiz
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Dave 80 30 90 90 33 90 1 1
2 Joe 100 90 65 90 100 90 1 0
3 Steve 90 88 95 20 100 50 0 1
PS - also see dplyr::if_all()
.
Edit: A solution to do the same across an arbitrary number of "quiz" / "exam" categories. This creates a separate summary table of failures by ID and type, which you can then merge back into your original dataframe.
library(dplyr)
library(tidyr)
failures <- data %>%
pivot_longer(
!ID,
names_to = c("type", "number"),
names_pattern = "^(\\w )(\\d )$"
) %>%
group_by(ID, type) %>%
summarize(
fail = as.numeric(any(value < 70)),
.groups = "drop"
) %>%
ungroup() %>%
pivot_wider(
names_from = type,
names_glue = "fail_{type}",
values_from = fail
)
data %>%
left_join(failures)
CodePudding user response:
Here is a generalized way using map_dfc
and if_any
.
library(dplyr)
cols <- c("exam", "quiz")
data %>%
mutate(map_dfc(cols, ~ transmute(data, "fail_{.x}" := if_any(starts_with(.x), `<`, 70))))
output
# A tibble: 3 × 9
ID exam1 exam2 exam3 quiz1 quiz2 quiz3 fail_exam fail_quiz
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Dave 80 30 90 90 33 90 1 1
2 Joe 100 90 65 90 100 90 1 0
3 Steve 90 88 95 20 100 50 0 1