Home > Blockchain >  Filter for one specific value in a column if someone has both
Filter for one specific value in a column if someone has both

Time:07-21

In my current data frame, some students have both a math course as well as an algebra or pre-algebra course (in the below example, that is students 2 and 4).

student_courses <- 
  tribble(
    ~ student_id, ~class, ~letter_grade, 
    001, "math", "A",
    001, "science", "A",
    001, "PE", "C",
    002, "math", "B",
    002, "algebra", "A",
    002, "science", "B",
    002, "PE", "B",
    003, "math", "B",
    003, "science", "B",
    003, "PE", "B",
    004, "math", "B",
    004, "pre-algebra", "B",
    004, "science", "B",
    004, "PE", "B"
  )

I need to filter the data frame in such a way that if a student has both a math and pre-algebra/algebra grade, their math grade will be filtered out. Below is what I need the data frame to look like.

student_courses_answer <- 
  tribble(
    ~ student_id, ~class, ~letter_grade, 
    001, "math", "A",
    001, "science", "A",
    001, "PE", "C",
    002, "algebra", "A",
    002, "science", "B",
    002, "PE", "B",
    003, "math", "B",
    003, "science", "B",
    003, "PE", "B",
    004, "pre-algebra", "B",
    004, "science", "B",
    004, "PE", "B"
  )

I've tried grouping by student_id and then stacking filters, but this doesn't work. I'm stumped.

Any Ideas?

CodePudding user response:

The condition is !(any(grepl("algebra", class)) & class == "math") in each ID. The use of grepl("algebra", class) can match both "algebra" and "pre-algebra".

library(dplyr)

student_courses %>%
  group_by(student_id) %>%
  filter( !(any(grepl("algebra", class)) & class == "math") ) %>%
  ungroup()

# # A tibble: 12 × 3
#    student_id class       letter_grade
#         <dbl> <chr>       <chr>       
#  1          1 math        A           
#  2          1 science     A           
#  3          1 PE          C           
#  4          2 algebra     A           
#  5          2 science     B           
#  6          2 PE          B           
#  7          3 math        B           
#  8          3 science     B           
#  9          3 PE          B           
# 10          4 pre-algebra B           
# 11          4 science     B           
# 12          4 PE          B

CodePudding user response:

A dplyr approach:

library(dplyr)
student_courses  |>
    group_by(student_id)  |>
    mutate(
        remove_math = ("math" %in% class) & ("algebra" %in% class | "pre-algebra" %in% class)
    )  |>
    filter(
        !(class == "math" & remove_math)
    )  |>
    select(-remove_math) |>
    ungroup()

# # A tibble: 12 x 3
#    student_id class       letter_grade
#         <dbl> <chr>       <chr>       
#  1          1 math        A
#  2          1 science     A
#  3          1 PE          C
#  4          2 algebra     A
#  5          2 science     B
#  6          2 PE          B
#  7          3 math        B
#  8          3 science     B
#  9          3 PE          B
# 10          4 pre-algebra B
# 11          4 science     B
# 12          4 PE          B

EDIT: Added algebra to the condition (was previously just pre-algebra).

  • Related