how to count number of response values by time thresholds in r-CodePudding

I have a student dataset that includes responses to questions as right or wrong. There is also a time variable in seconds. I would like to create a time flag to record number of correct and incorrect responses by 1 minute 2 minute and 3 minute thresholds. Here is a sample dataset.

df <- data.frame(id = c(1,2,3,4,5),
                 gender = c("m","f","m","f","m"),
                 age = c(11,12,12,13,14),
                 i1 = c(1,0,NA,1,0),
                 i2 = c(0,1,0,"1]",1),
                 i3 = c("1]",1,"1]",0,"0]"),
                 i4 = c(0,"0]",1,1,0),
                 i5 = c(1,1,NA,"0]","1]"),
                 i6 = c(0,0,"0]",1,1),
                 i7 = c(1,"1]",1,0,0),
                 i8 = c(0,0,0,"1]","1]"),
                 i9 = c(1,1,1,0,NA),
                 time = c(115,138,148,195, 225))


 > df
  id gender age i1 i2 i3 i4   i5 i6 i7 i8 i9 time
1  1      m  11  1  0 1]  0    1  0  1  0  1  115
2  2      f  12  0  1  1 0]    1  0 1]  0  1  138
3  3      m  12 NA  0 1]  1 <NA> 0]  1  0  1  148
4  4      f  13  1 1]  0  1   0]  1  0 1]  0  195
5  5      m  14  0  1 0]  0   1]  1  0 1] NA  225

The minute thresholds are represented by a ] sign at the right side of the score.

For example for the id = 3, the 1-minute threshold is at item i3 , the 2-minute threshold is at item i6. Each student might have different time thresholds.

I need to create flagging variables to count number of correct and incorrect responses by the 1-min 2-min and 3-min thresholds.

How can I achieve the desired dataset as below.

> df1
  id gender age i1 i2 i3 i4   i5 i6 i7 i8 i9 time one_true one_false two_true two_false three_true three_false
1  1      m  11  1  0 1]  0    1  0  1  0  1  115        2         1       NA        NA         NA          NA
2  2      f  12  0  1  1 0]    1  0 1]  0  1  138        2         2        4         3         NA          NA
3  3      m  12 NA  0 1]  1 <NA> 0]  1  0  1  148        1         1        2         2         NA          NA
4  4      f  13  1 1]  0  1   0]  1  0 1]  0  195        2         0        3         2          5           3
5  5      m  14  0  1 0]  0   1]  1  0 1] NA  225        1         2        2         3          4           4

CodePudding user response：

library(tidyverse)

df %>%
  pivot_longer(i1:i9,values_transform = as.character) %>%
  group_by(id)%>%
  mutate(vs = rev(cumsum(replace_na(str_detect(rev(value),']'),0))))%>%
  filter(vs > 0)%>%
  mutate(vs = max(vs) - vs   1)%>%
  group_by(vs,.add = TRUE)%>%
  summarise(true = sum(str_detect(value, '1'), na.rm = TRUE),
            false =  sum(str_detect(value, '0'), na.rm = TRUE),
            .groups = "drop_last")%>%
  mutate(across(c(true, false),cumsum)) %>%
  pivot_wider(id, names_from = vs, values_from = c(true, false))

# A tibble: 5 x 7
# Groups:   id [5]
     id true_1 true_2 true_3 false_1 false_2 false_3
  <dbl>  <int>  <int>  <int>   <int>   <int>   <int>
1     1      2     NA     NA       1      NA      NA
2     2      2      4     NA       2       3      NA
3     3      1      2     NA       1       2      NA
4     4      2      3      5       0       2       3
5     5      1      2      4       2       3       4

CodePudding user response：

You could also accomplish the same in base R:

fun <- function(x){
  a <- diff(c(0,which(grepl("]", x))))
  f_sum <- function(x,y) sum(na.omit(grepl(x,y)))
  fn <- function(x) c(true = f_sum('1',x), false = f_sum('0',x))
  y <- tapply(x[seq(sum(a))], rep(seq_along(a),a), fn)
  s <- do.call(rbind, Reduce(" ", y, accumulate = TRUE))
  nms <- do.call(paste, c(sep='_',expand.grid(colnames(s), seq(nrow(s)))))
  setNames(c(t(s)), nms)
}

fun2 <- function(x){
  ln <- lengths(x)
  nms <- names(x[[which.max(ln)]])
  do.call(rbind, lapply(x, function(x)setNames(`length<-`(x,max(ln)),nms)))
}


fun2(apply(df[4:12],1,fun))
     true_1 false_1 true_2 false_2 true_3 false_3
[1,]      2       1     NA      NA     NA      NA
[2,]      2       2      4       3     NA      NA
[3,]      1       1      2       2     NA      NA
[4,]      2       0      3       2      5       3
[5,]      1       2      2       3      4       4