Beginner R programmer here.
I have 73 variables, 70 with questions (named Q1:Q72) and a grouping variable ("Color") with values "red" or "blue". If the participant (1 participant/row) is in group "red" they have values between 5 and 76 in columns Q1:Q76. If they are in group "blue" they have values between 77 and 148 in columns. However, I want all answers to be between 1 and 72. That is, I want to subtract 76 from Q1:Q72 if the participant is in group "blue" and subtract 4 from Q1:Q72 if participant is in group "red".
So far, my solution has been to split the df into two new df ("dfBlue" and "dfRed"), then subtract 5 from "dfRed" and subtract 76 from "dfBlue" and finally merge the two new dataframes.
Can someone help me with a more elegant solution where no new dataframe is needed?
Thanks!!
df:
Color | Q1 | Q2 | ... | Q72 |
---|---|---|---|---|
red | 5 | 46 | ... | 32 |
blue | 107 | 85 | ... | 94 |
blue | 83 | 145 | ... | 128 |
... | ... | ... | ... | ... |
red | 47 | 34 | ... | 74 |
How I want it to be:
Color | Q1 | Q2 | ... | Q72 |
---|---|---|---|---|
red | 1 | 42 | ... | 28 |
blue | 31 | 9 | ... | 18 |
blue | 7 | 69 | ... | 52 |
... | ... | ... | ... | ... |
red | 43 | 30 | ... | 70 |
CodePudding user response:
A dplyr
option:
library(dplyr)
df %>% mutate(across(starts_with("Q"), ~if_else(Color == "red", .x - 4, .x - 76)))
# Color Q1 Q2 Q72
#1 red 1 42 28
#2 blue 31 9 18
#3 blue 7 69 52
#4 red 43 30 70
PS. Please note the typo in your expected output: row 3, Q72 should be 128 - 76 = 56 52.
Sample data
df <-read.table(text = "Color Q1 Q2 Q72
red 5 46 32
blue 107 85 94
blue 83 145 128
red 47 34 74", header = T)