I have a large data set:
head(data)
subject stim1 stim2 Chosen outcome
1 1 2 1 2 0
2 1 3 2 2 0
3 1 3 1 1 0
4 1 2 3 3 1
5 1 1 3 1 1
6 1 2 1 1 1
tail(data)
subject stim1 stim2 Chosen outcome
44249 3020 40 42 42 0
44250 3020 40 41 41 1
44251 3020 44 45 45 1
44252 3020 41 43 43 0
44253 3020 42 40 42 0
44254 3020 42 44 44 1
My objective is (within each subject)for each row to check the most recent case where the same two stim1 and stim2 were presented and then to add a column with
- the entry for Chosen from that row (Previous_Choice)
- the outcome variable from that row (Previous_outcome)
- whether the number that was previously not chosen in that row (i.e in the Previous_Choice row) was subsequently chosen in any row leading up to the current trial. For example if its stim1=1 and stim2=2 and Chosen=2, then I am looking whether Chosen=1 in any trials subsequent to that (leading up to my current row) (S_choice) (see row 6 for example)
the tricky part is that i don't care about which one of the numbers is stim1 and which of the numbers is stim2. For example if my current trial stim1=1 and stim2=2 i want the most recent trial where (stim1=1,stim2=2 OR stim1=2, stim2=1)
Desired outcome
subject stim1 stim2 Chosen outcome Previous_Choice Previous_Outcome S_choice
1 1 2 1 2 0 NA NA NA
2 1 3 2 2 0 NA NA NA
3 1 3 1 1 0 NA NA NA
4 1 2 3 3 1 2 0 FALSE
5 1 1 3 1 1 1 0 FALSE
6 1 2 1 1 1 2 0 TRUE
note- the reason S_choice is true in row six is because subsequent to trial 1 (where 1 and 2 were stim1 and stim2) 1 was chosen in row 3 and row 5
str(data)
'data.frame': 44254 obs. of 5 variables:
$ subject: num 1 1 1 1 1 1 1 1 1 1 ...
$ stim1 : int 2 3 3 2 1 2 2 3 2 2 ...
$ stim2 : int 1 2 1 3 3 1 3 1 1 1 ...
$ Chosen : int 2 2 1 3 1 1 2 1 2 2 ...
$ outcome: int 0 0 0 1 1 1 1 0 1 0 ...
CodePudding user response:
I don't understand what does the S_choise mean, but may be I can help you with other 2 columns.
LastOrNa <- function(x) {
if (length(x) == 0) {
return(NA)
}
return(last(x))
}
LastEq <- function(x, y) {
res <- sapply(2:length(x), function(t) {
LastOrNa(which(
(x[1:(t - 1)] == x[t] & y[1:(t - 1)] == y[t]) |
(x[1:(t - 1)] == y[t] & y[1:(t - 1)] == x[t])
))
}
)
return(c(NA, res))
}
data %>% group_by(subject) %>%
mutate(
last_eq = LastEq(stim1, stim2),
Previous_Choice = Chosen[last_eq],
Previous_Outcome = outcome[last_eq],
last_eq = NULL
)