Home > Blockchain >  Creating groups based on values in multiple columns in R
Creating groups based on values in multiple columns in R

Time:11-23

I have a data frame like this:

ID <- c("A", "B", "C", "D", "E", "F")
Score1 <- c("(25-30)", "(31-40)", "(41-60)", "(25-30)","(25-30)","(25-30)")#(25-30) low problems cut-off
Score2 <- c("(0-5)", "(6-11)", "(25-30)", "(6-11)", "(0-5)", "(0-5)") #"(0-5)" low problems cut-off
Score3 <- c("(12-20)", "(21-42)", "(43-55)", "(12-20)", "(21-42)","(12-20)")#"(12-20)" low problems cut-off
Score4 <- c("(1-20)", "(21-60)", "(61-80)", "(1-20)", "(1-20)", "(1-20)")#"(1-20)" low problems cut-off
df <- data.frame(ID, Score1, Score2, Score3, Score4)

I want to create groups based on the categories they fall into for Score 1 to 4.

These scoring categories are my cut-off values as low problems, moderate problems, and high problems.

The idea is that as long as a participant fell in one of the categories of moderate or high problem, they will go to the experimental group and those who fall low problems score category for all Scores, will go to the control group.

That's why, I tried something like below which a friend here suggested, but my question there was a bit different, I guess that's why it is working on a different logic.

Below I wanted to tell R to put those who fell into the first scoring category for all Scores into the control group, whereas others into the experimental group.

df <- df %>%
  mutate(Group = case_when(
    Score1 == "(25-30)" | Score2 == "(0-5)" | Score3 == "(12-20)" | Score4 == "(1-20)"
    ~ "Control", 
    TRUE ~ "Experimental" ))

But this is what you get in the end:

  ID  Score1  Score2  Score3  Score4        Group
1  A (25-30)   (0-5) (12-20)  (1-20)      Control
2  B (31-40)  (6-11) (21-42) (21-60) Experimental
3  C (41-60) (25-30) (43-55) (61-80) Experimental
4  D (25-30)  (6-11) (12-20)  (1-20)      Control
5  E (25-30)   (0-5) (21-42)  (1-20)      Control
6  F (25-30)   (0-5) (12-20)  (1-20)      Control

as you can see, participants D and E are in the control group although Score2 for participant D and Score3 for participant E are in the moderate cutoff values, in other words, the scoring groups that I didn't specify in the code.

It is taking participants to the experimental group only if they are not in the low problems cutoff for all scores. How should I modify my code?

Sorry for my long question. Thanks a lot!

CodePudding user response:

IMHO it's easier to check if all scores are in the lower group, i.e. using & and an if_else you could do:

library(dplyr, warn = FALSE)

df |> 
  mutate(Group = if_else(Score1 == "(25-30)" & Score2 == "(0-5)" & Score3 == "(12-20)" & Score4 == "(1-20)", "Control", "Experimental"))
#>   ID  Score1  Score2  Score3  Score4        Group
#> 1  A (25-30)   (0-5) (12-20)  (1-20)      Control
#> 2  B (31-40)  (6-11) (21-42) (21-60) Experimental
#> 3  C (41-60) (25-30) (43-55) (61-80) Experimental
#> 4  D (25-30)  (6-11) (12-20)  (1-20) Experimental
#> 5  E (25-30)   (0-5) (21-42)  (1-20) Experimental
#> 6  F (25-30)   (0-5) (12-20)  (1-20)      Control
  • Related