R - Find and Update Based On Most Recent Matching Columns-CodePudding

I have a large data set:

head(data)

  subject stim1 stim2 Chosen outcome
1       1     2     1      2       0
2       1     3     2      2       0
3       1     3     1      1       0
4       1     2     3      3       1
5       1     1     3      1       1
6       1     2     1      1       1

tail(data)
      subject stim1 stim2 Chosen outcome
44249    3020    40    42     42       0
44250    3020    40    41     41       1
44251    3020    44    45     45       1
44252    3020    41    43     43       0
44253    3020    42    40     42       0
44254    3020    42    44     44       1

My objective is (within each subject)for each row to check the most recent case where the same two stim1 and stim2 were presented and then to add a column with

the entry for Chosen from that row (Previous_Choice)
the outcome variable from that row (Previous_outcome)
whether the number that was previously not chosen in that row (i.e in the Previous_Choice row) was subsequently chosen in any row leading up to the current trial. For example if its stim1=1 and stim2=2 and Chosen=2, then I am looking whether Chosen=1 in any trials subsequent to that (leading up to my current row) (S_choice) (see row 6 for example)

the tricky part is that i don't care about which one of the numbers is stim1 and which of the numbers is stim2. For example if my current trial stim1=1 and stim2=2 i want the most recent trial where (stim1=1,stim2=2 OR stim1=2, stim2=1)

Desired outcome

  subject stim1 stim2 Chosen outcome   Previous_Choice  Previous_Outcome  S_choice 
1       1     2     1      2       0         NA                 NA         NA
2       1     3     2      2       0         NA                 NA         NA
3       1     3     1      1       0         NA                 NA         NA
4       1     2     3      3       1          2                 0        FALSE
5       1     1     3      1       1          1                 0        FALSE
6       1     2     1      1       1          2                 0        TRUE

note- the reason S_choice is true in row six is because subsequent to trial 1 (where 1 and 2 were stim1 and stim2) 1 was chosen in row 3 and row 5

  str(data)
'data.frame':   44254 obs. of  5 variables:
 $ subject: num  1 1 1 1 1 1 1 1 1 1 ...
 $ stim1  : int  2 3 3 2 1 2 2 3 2 2 ...
 $ stim2  : int  1 2 1 3 3 1 3 1 1 1 ...
 $ Chosen : int  2 2 1 3 1 1 2 1 2 2 ...
 $ outcome: int  0 0 0 1 1 1 1 0 1 0 ...

CodePudding user response：

I don't understand what does the S_choise mean, but may be I can help you with other 2 columns.

LastOrNa <- function(x) {
  if (length(x) == 0) {
    return(NA)
  }
  return(last(x))
}

LastEq <- function(x, y) {
  res <- sapply(2:length(x), function(t) {
    LastOrNa(which(
        (x[1:(t - 1)] == x[t] & y[1:(t - 1)] == y[t]) |
         (x[1:(t - 1)] == y[t] & y[1:(t - 1)] == x[t])
      ))
    }
  )
  return(c(NA, res))
}

data %>% group_by(subject) %>% 
  mutate(
    last_eq = LastEq(stim1, stim2),
    Previous_Choice = Chosen[last_eq],
    Previous_Outcome = outcome[last_eq],
    last_eq = NULL
  )