I have a dataframe where I would like to check if people identified their right theme from a memory test. Each participant saw a different stimuli(s), so doing so is slightly more complicated than I expected. The first participant, for instant, saw the suicide, the memory, and the time themes, so if they have a 1 in those variable columns thats good. If they have a 1 in a column that they didn't see, thats bad. For instant, participant 1 below correctly identified all of their images, because they were shown suicide, memory, and time, and have a 1 in that column, and a 0 in the other columns. However the next participant said they saw the memory column but didnt. I would like to create four additional columns that show 1 if they got the theme correctly (saw the theme and marked 1 or didnt see the theme and marked 0), and 0 if they got it incorrect (saw the theme and marked it 0 or didn't see the theme and marked it 1).
I'm a little at a loss on how to do this and appreciate the help!!!
list <- c("suicide memory time","suicide vomit time","vomit alcohol time"," ",
" ","alcohol suicide children")
id <- c(1:6)
suicide<- c(1,1,0,0,0,1)
memory <- c(1,0,0,0,0,0)
alcohol<- c(0,1,1,1,1,1)
time<- c(1,0,1,1,1,0)
foil1<- c(0,0,0,0,0,0)
foil2 <- c(0,0,1,0,0,0)
df<- data.frame(list,id,suicide,memory,alcohol, time, foil1, foil2)
How do I create 4 new columns: suicide_score memory_score... etc that show 0/1 for each participant based on what they actually saw?
CodePudding user response:
nms <- names(df)[3:8]
out <- t(sapply(strsplit(df$list, " "), match, x = nms, nomatch = 0L))
colnames(out) <- paste0(nms, "_score")
cbind(df, data.frame( (out > 0)))
# list id suicide memory alcohol time foil1 foil2 suicide_score memory_score alcohol_score time_score foil1_score foil2_score
# 1 suicide memory time 1 1 1 0 1 0 0 1 1 0 1 0 0
# 2 suicide vomit time 2 1 0 1 0 0 0 1 0 0 1 0 0
# 3 vomit alcohol time 3 0 0 1 1 0 1 0 0 1 1 0 0
# 4 4 0 0 1 1 0 0 0 0 0 0 0 0
# 5 5 0 0 1 1 0 0 0 0 0 0 0 0
# 6 alcohol suicide children 6 1 0 1 0 0 0 1 0 1 0 0 0
CodePudding user response:
Here is a very verbose approach using tidyverse and nnet libraries:
library(nnet)
library(tidyverse)
df %>%
select(list, id) %>%
separate_rows(list) %>%
mutate(list = as.factor(list)) %>%
cbind((class.ind(.$list) == 1)*1) %>% # nnet library
group_by(id) %>%
mutate(list = toString(list)) %>%
summarise(across(-c(list, V1), sum)) %>%
rename_with(., ~paste(., "score", sep = "_")) %>%
rename(id = id_score) %>%
right_join(df, by= "id") %>%
relocate(list:foil2, everything())
A tibble: 6 x 14
list suicide memory alcohol time foil1 foil2 id alcohol_score children_score memory_score suicide_score time_score vomit_score
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 "suicide memory time" 1 1 0 1 0 0 1 0 0 1 1 1 0
2 "suicide vomit time" 1 0 1 0 0 0 2 0 0 0 1 1 1
3 "vomit alcohol time" 0 0 1 1 0 1 3 1 0 0 0 1 1
4 " " 0 0 1 1 0 0 4 0 0 0 0 0 0
5 " " 0 0 1 1 0 0 5 0 0 0 0 0 0
6 "alcohol suicide children" 1 0 1 0 0 0 6 1 1 0 1 0 0