Home > database >  Matrix of muptiple choice questions in R
Matrix of muptiple choice questions in R

Time:10-22

I've two groups of multiple choice questions: a and b, each group contains 3 questions, so I've 6 columns - a1, a2, a3, b1, b2, b3. I try to create matrix of answers like a1xb1 - 1 answer, a1xb2 - 3 answers, etc. So, I hope to get a new table, where a1:a3 will be as columns and b1:b3 will be as rows and get number of crossings.

How could I change the code below to get a new df with the number of crossings between 6 questions? The result of the code differs from what I hope to get.

The table I'd like to get (yellow means number of crossings)

enter image description here

library(tidyverse)
library(dplyr)

a1 <- c("x1", NA, "x1", NA, "x1")
a2 <- c(NA, "x2", "x2", "x2", NA)
a3 <- c(NA, "x3", NA, "x3", NA)

b1 <- c("y1", "y1", NA, "y1", NA)
b2 <- c("y2", NA, "y2", NA, "y2")
b3 <- c("y3", NA, "y3", "y3", NA)

testdf1 <- data.frame(cbind(a1, a2, a3, b1, b2, b3))

testdf2 <- testdf1 %>%
  pivot_longer(cols = -c(b1:b3)) %>%
  group_by(b1, b2, b3, value) %>%
  summarise(N=n()) %>%
  ungroup() %>%
  drop_na() %>%
  pivot_wider(names_from = c("b1", "b2", "b3"), values_from = "N")

CodePudding user response:

Use crossprod. For 6 questions, just select the matrix appropriately:

crossprod(!is.na(testdf1[1:3]), !is.na(testdf1[4:6]))
   b1 b2 b3
a1  1  3  2
a2  2  1  2
a3  2  0  1

This can be further generalized as:

list1 <- split.default(testdf1, sub("\\d ", '', names(testdf1)))
do.call(crossprod, unname(lapply(list1, Negate(is.na))))

   b1 b2 b3
a1  1  3  2
a2  2  1  2
a3  2  0  1

CodePudding user response:

Using outer with a Vectorized function.

A <- !is.na(testdf1[1:3])
B <- !is.na(testdf1[4:6])

outer((sa <- seq_len(ncol(A))), (sb <- seq_len(ncol(B))), 
      Vectorize(\(x, y) sum(A[, x]   B[, y] == 2))) |>
  `dimnames<-`(list(paste0('a', sa), paste0('b', sb)))
#    b1 b2 b3
# a1  1  3  2
# a2  2  1  2
# a3  2  0  1
  •  Tags:  
  • r
  • Related