I have a student dataset that includes responses to questions as right or wrong. There is also a time variable in seconds. I would like to create a time flag to record number of correct and incorrect responses by 1 minute
2 minute
and 3 minute
thresholds. Here is a sample dataset.
df <- data.frame(id = c(1,2,3,4,5),
gender = c("m","f","m","f","m"),
age = c(11,12,12,13,14),
i1 = c(1,0,NA,1,0),
i2 = c(0,1,0,"1]",1),
i3 = c("1]",1,"1]",0,"0]"),
i4 = c(0,"0]",1,1,0),
i5 = c(1,1,NA,"0]","1]"),
i6 = c(0,0,"0]",1,1),
i7 = c(1,"1]",1,0,0),
i8 = c(0,0,0,"1]","1]"),
i9 = c(1,1,1,0,NA),
time = c(115,138,148,195, 225))
> df
id gender age i1 i2 i3 i4 i5 i6 i7 i8 i9 time
1 1 m 11 1 0 1] 0 1 0 1 0 1 115
2 2 f 12 0 1 1 0] 1 0 1] 0 1 138
3 3 m 12 NA 0 1] 1 <NA> 0] 1 0 1 148
4 4 f 13 1 1] 0 1 0] 1 0 1] 0 195
5 5 m 14 0 1 0] 0 1] 1 0 1] NA 225
The minute thresholds are represented by a ]
sign at the right side of the score.
For example for the id = 3
, the 1-minute
threshold is at item i3
, the 2-minute
threshold is at item i6
. Each student might have different time thresholds.
I need to create flagging variables to count number of correct and incorrect responses by the 1-min
2-min
and 3-min
thresholds.
How can I achieve the desired dataset as below.
> df1
id gender age i1 i2 i3 i4 i5 i6 i7 i8 i9 time one_true one_false two_true two_false three_true three_false
1 1 m 11 1 0 1] 0 1 0 1 0 1 115 2 1 NA NA NA NA
2 2 f 12 0 1 1 0] 1 0 1] 0 1 138 2 2 4 3 NA NA
3 3 m 12 NA 0 1] 1 <NA> 0] 1 0 1 148 1 1 2 2 NA NA
4 4 f 13 1 1] 0 1 0] 1 0 1] 0 195 2 0 3 2 5 3
5 5 m 14 0 1 0] 0 1] 1 0 1] NA 225 1 2 2 3 4 4
CodePudding user response:
library(tidyverse)
df %>%
pivot_longer(i1:i9,values_transform = as.character) %>%
group_by(id)%>%
mutate(vs = rev(cumsum(replace_na(str_detect(rev(value),']'),0))))%>%
filter(vs > 0)%>%
mutate(vs = max(vs) - vs 1)%>%
group_by(vs,.add = TRUE)%>%
summarise(true = sum(str_detect(value, '1'), na.rm = TRUE),
false = sum(str_detect(value, '0'), na.rm = TRUE),
.groups = "drop_last")%>%
mutate(across(c(true, false),cumsum)) %>%
pivot_wider(id, names_from = vs, values_from = c(true, false))
# A tibble: 5 x 7
# Groups: id [5]
id true_1 true_2 true_3 false_1 false_2 false_3
<dbl> <int> <int> <int> <int> <int> <int>
1 1 2 NA NA 1 NA NA
2 2 2 4 NA 2 3 NA
3 3 1 2 NA 1 2 NA
4 4 2 3 5 0 2 3
5 5 1 2 4 2 3 4
CodePudding user response:
You could also accomplish the same in base R:
fun <- function(x){
a <- diff(c(0,which(grepl("]", x))))
f_sum <- function(x,y) sum(na.omit(grepl(x,y)))
fn <- function(x) c(true = f_sum('1',x), false = f_sum('0',x))
y <- tapply(x[seq(sum(a))], rep(seq_along(a),a), fn)
s <- do.call(rbind, Reduce(" ", y, accumulate = TRUE))
nms <- do.call(paste, c(sep='_',expand.grid(colnames(s), seq(nrow(s)))))
setNames(c(t(s)), nms)
}
fun2 <- function(x){
ln <- lengths(x)
nms <- names(x[[which.max(ln)]])
do.call(rbind, lapply(x, function(x)setNames(`length<-`(x,max(ln)),nms)))
}
fun2(apply(df[4:12],1,fun))
true_1 false_1 true_2 false_2 true_3 false_3
[1,] 2 1 NA NA NA NA
[2,] 2 2 4 3 NA NA
[3,] 1 1 2 2 NA NA
[4,] 2 0 3 2 5 3
[5,] 1 2 2 3 4 4