I have the following dataset, from a choice experiment:
ID task opt q1 q2 q3 q4 q5 q6 q7 q8
1 1 1 1 2 3 1 2 2 1 2 1
2 1 1 2 2 3 1 2 2 1 2 1
3 1 1 3 2 3 1 2 2 1 2 1
4 1 2 1 2 3 1 2 2 1 2 1
5 1 2 2 2 3 1 2 2 1 2 1
6 1 2 3 2 3 1 2 2 1 2 1
7 1 3 1 2 3 1 2 2 1 2 1
8 1 3 2 2 3 1 2 2 1 2 1
9 1 3 3 2 3 1 2 2 1 2 1
10 1 4 1 2 3 1 2 2 1 2 1
11 1 4 2 2 3 1 2 2 1 2 1
12 1 4 3 2 3 1 2 2 1 2 1
13 1 5 1 2 3 1 2 2 1 2 1
14 1 5 2 2 3 1 2 2 1 2 1
15 1 5 3 2 3 1 2 2 1 2 1
16 1 6 1 2 3 1 2 2 1 2 1
17 1 6 2 2 3 1 2 2 1 2 1
18 1 6 3 2 3 1 2 2 1 2 1
19 1 7 1 2 3 1 2 2 1 2 1
20 1 7 2 2 3 1 2 2 1 2 1
21 1 7 3 2 3 1 2 2 1 2 1
22 1 8 1 2 3 1 2 2 1 2 1
23 1 8 2 2 3 1 2 2 1 2 1
24 1 8 3 2 3 1 2 2 1 2 1
Meaning individual 1 (ID
) was asked 8 questions (task
), each of which had 3 options (opt
) she could choose from. The remaining columns (q1...q8
) reflect the option she chose in each of the choice tasks. So, she chose option 2 on question 1 and, again, option 2 on question 7.
I'm looking to turn it into a dataset that looks like this instead:
ID task opt chosen
1 1 1 1 FALSE
2 1 1 2 TRUE
3 1 1 3 FALSE
4 1 2 1 FALSE
5 1 2 2 FALSE
6 1 2 3 TRUE
7 1 3 1 TRUE
8 1 3 2 FALSE
9 1 3 3 FALSE
...
22 1 8 1 TRUE
23 1 8 2 FALSE
24 1 8 3 FALSE
In other words, a new column (chosen
) reflects TRUE
if that particular option was chosen in that particular question, and FALSE
otherwise.
I've tried all forms of melt
and reshape
and cast
but I've come up empty.
Thank you!
CodePudding user response:
You can first pivot_longer
columns q1
to q8
, then filter
for "q" number that matches the task
column. Then set the chosen
column to TRUE
if the value
column matches the opt
column. Finally removes the newly created columns.
Update: Credit to @Sotos for improving the answer. Now we set the logical column chosen
directly from the comparison of value
to opt
without an if_else
.
library(dplyr)
library(tidyr)
df %>% pivot_longer(q1:q8, names_to = "Q") %>%
filter(task == gsub("q", "", Q)) %>%
mutate(chosen = value == opt) %>%
select(-Q, -value)
ID task opt chosen
1 1 1 1 FALSE
2 1 1 2 TRUE
3 1 1 3 FALSE
4 1 2 1 FALSE
5 1 2 2 FALSE
6 1 2 3 TRUE
7 1 3 1 TRUE
8 1 3 2 FALSE
9 1 3 3 FALSE
10 1 4 1 FALSE
11 1 4 2 TRUE
12 1 4 3 FALSE
13 1 5 1 FALSE
14 1 5 2 TRUE
15 1 5 3 FALSE
16 1 6 1 TRUE
17 1 6 2 FALSE
18 1 6 3 FALSE
19 1 7 1 FALSE
20 1 7 2 TRUE
21 1 7 3 FALSE
22 1 8 1 TRUE
23 1 8 2 FALSE
24 1 8 3 FALSE
CodePudding user response:
Here is a dplyr
solution with rowwise()
and c_across()
, where you don't even need to transform data to long form.
library(dplyr)
df %>%
rowwise() %>%
mutate(chosen = c_across(q1:q8)[task] == opt) %>%
ungroup() %>%
select(!q1:q8)
# A tibble: 24 × 4
ID task opt chosen
<int> <int> <int> <lgl>
1 1 1 1 FALSE
2 1 1 2 TRUE
3 1 1 3 FALSE
4 1 2 1 FALSE
5 1 2 2 FALSE
6 1 2 3 TRUE
7 1 3 1 TRUE
8 1 3 2 FALSE
9 1 3 3 FALSE
10 1 4 1 FALSE
11 1 4 2 TRUE
12 1 4 3 FALSE
13 1 5 1 FALSE
14 1 5 2 TRUE
15 1 5 3 FALSE
16 1 6 1 TRUE
17 1 6 2 FALSE
18 1 6 3 FALSE
19 1 7 1 FALSE
20 1 7 2 TRUE
21 1 7 3 FALSE
22 1 8 1 TRUE
23 1 8 2 FALSE
24 1 8 3 FALSE