This relates to another problem I posted, but I did not quite ask the right question. If anyone can help with this, it would really be appreciated.
I have a DF with several players' answers to 100 questions in a quiz (example data frame below with 10 questions and 10 players-not the real data, which is not really from a quiz, but the principle is the same).
My goal is to create a function that will check when a player has answered 3 questions incorrectly cumulatively at any point during their answers, and then change their following answers to the string "disc". I would like to be able to change the parameters also, so it could be 4 or 5 questions incorrect etc. In the df: 1=correct, 0=incorrect, and 2=unanswered. Unanswered is considered incorrect, but I do not want to recode it as 0.
df=data.frame(playerID=numeric(),
q1=numeric(),
q2=numeric(),
q3=numeric(),
q4=numeric(),
q5=numeric(),
q6=numeric(),
q7=numeric(),
q8=numeric(),
q9=numeric(),
q10=numeric())
set.seed(1)
for(i in 1:10){
list_i=c(i,sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1))
df[i,]=list_i
}
So, in this DF, for example, playerID=3,8 and 9 should have their answers="disc" from q4 onwards, whereas playerid5 should have “disc” from 8 onwards. So anytime there are 3 consecutive incorrect answers (including values of 2), the following answers should change to “disc”.
I presume the syntax would be a for loop with an if statement inside using mutate or similar.
CodePudding user response:
Are you looking for something like this?
library(tidyverse)
n <- 100
f <- function(v, cap, new_value){
df <-
data.frame(v = v) |>
mutate(
b = cumsum(v),
v_new = ifelse(b > cap, new_value, v)
)
return(df$v_new)
}
# apply function to vector
v <- runif(n)
v_new <- f(v, 5, "disc")
# apply function in a dataframe with mutate
df <-
data.frame(a = runif(n))
df |>
mutate(
b = f(a, 5, "disc")
)
CodePudding user response:
One possible solution using mutate
and across
:
df %>%
ungroup() %>%
mutate(
# Mutate across all question columns
across(
starts_with("q"),
function(col) {
# Get previous columns
col_i <- which(names(cur_data())==cur_column())
previous_cols <- 2:(col_i-1)
# Get results for previous questions as string (i.e. zero, or 2)
previous_qs <- select(cur_data(), all_of(previous_cols)) %>%
mutate(across(everything(), ~as.numeric(.x %in% c(0,2)))) %>%
tidyr::unite("str", sep = "") %>%
pull(str)
# Check for three successive incorrect answers at some previous point
results <- grepl(pattern = "111", previous_qs)
# For those with three successive incorrect answers at some previous point, overwrite value with 'disc'
col[results] <- "disc"
col
}
)
)