Home > Mobile >  R studio 4.1.2: Dynamically check values for a cumulative pattern. Null following values if that pat
R studio 4.1.2: Dynamically check values for a cumulative pattern. Null following values if that pat

Time:09-13

This relates to another problem I posted, but I did not quite ask the right question. If anyone can help with this, it would really be appreciated.

I have a DF with several players' answers to 100 questions in a quiz (example data frame below with 10 questions and 10 players-not the real data, which is not really from a quiz, but the principle is the same).

My goal is to create a function that will check when a player has answered 3 questions incorrectly cumulatively at any point during their answers, and then change their following answers to the string "disc". I would like to be able to change the parameters also, so it could be 4 or 5 questions incorrect etc. In the df: 1=correct, 0=incorrect, and 2=unanswered. Unanswered is considered incorrect, but I do not want to recode it as 0.

df=data.frame(playerID=numeric(),
              q1=numeric(),
              q2=numeric(),
              q3=numeric(),
              q4=numeric(),
              q5=numeric(),
              q6=numeric(),
              q7=numeric(),
              q8=numeric(),
              q9=numeric(),
              q10=numeric())

set.seed(1)
for(i in 1:10){
  list_i=c(i,sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1))
  df[i,]=list_i
}

So, in this DF, for example, playerID=3,8 and 9 should have their answers="disc" from q4 onwards, whereas playerid5 should have “disc” from 8 onwards. So anytime there are 3 consecutive incorrect answers (including values of 2), the following answers should change to “disc”.

I presume the syntax would be a for loop with an if statement inside using mutate or similar.

CodePudding user response:

Are you looking for something like this?

library(tidyverse)

n <- 100
f <- function(v, cap, new_value){
  df <- 
    data.frame(v = v) |> 
    mutate(
      b = cumsum(v),
      v_new = ifelse(b > cap, new_value, v)
      )
  return(df$v_new)
}

# apply function to vector
v <- runif(n)
v_new <- f(v, 5, "disc")

# apply function in a dataframe with mutate 
df <-
  data.frame(a = runif(n))
df |> 
  mutate(
    b = f(a, 5, "disc")
  )

CodePudding user response:

One possible solution using mutate and across:

df %>%
  ungroup() %>%
  mutate(
    # Mutate across all question columns
    across(
      starts_with("q"),
      function(col) {
        # Get previous columns
        col_i <- which(names(cur_data())==cur_column())
        previous_cols <- 2:(col_i-1)
        
        # Get results for previous questions as string (i.e. zero, or 2)
        previous_qs <- select(cur_data(), all_of(previous_cols)) %>%
          mutate(across(everything(), ~as.numeric(.x %in% c(0,2)))) %>%
          tidyr::unite("str", sep = "") %>%
          pull(str)

        # Check for three successive incorrect answers at some previous point
        results <- grepl(pattern = "111", previous_qs)
        
        # For those with three successive incorrect answers at some previous point, overwrite value with 'disc'
        col[results] <- "disc"
        col
      }
    )
  )
  • Related