I've two groups of multiple choice questions: a and b, each group contains 3 questions, so I've 6 columns - a1, a2, a3, b1, b2, b3. I try to create matrix of answers like a1xb1 - 1 answer, a1xb2 - 3 answers, etc. So, I hope to get a new table, where a1:a3 will be as columns and b1:b3 will be as rows and get number of crossings.
How could I change the code below to get a new df with the number of crossings between 6 questions? The result of the code differs from what I hope to get.
The table I'd like to get (yellow means number of crossings)
library(tidyverse)
library(dplyr)
a1 <- c("x1", NA, "x1", NA, "x1")
a2 <- c(NA, "x2", "x2", "x2", NA)
a3 <- c(NA, "x3", NA, "x3", NA)
b1 <- c("y1", "y1", NA, "y1", NA)
b2 <- c("y2", NA, "y2", NA, "y2")
b3 <- c("y3", NA, "y3", "y3", NA)
testdf1 <- data.frame(cbind(a1, a2, a3, b1, b2, b3))
testdf2 <- testdf1 %>%
pivot_longer(cols = -c(b1:b3)) %>%
group_by(b1, b2, b3, value) %>%
summarise(N=n()) %>%
ungroup() %>%
drop_na() %>%
pivot_wider(names_from = c("b1", "b2", "b3"), values_from = "N")
CodePudding user response:
Use crossprod
. For 6 questions, just select the matrix appropriately:
crossprod(!is.na(testdf1[1:3]), !is.na(testdf1[4:6]))
b1 b2 b3
a1 1 3 2
a2 2 1 2
a3 2 0 1
This can be further generalized as:
list1 <- split.default(testdf1, sub("\\d ", '', names(testdf1)))
do.call(crossprod, unname(lapply(list1, Negate(is.na))))
b1 b2 b3
a1 1 3 2
a2 2 1 2
a3 2 0 1
CodePudding user response:
Using outer
with a Vectorize
d function.
A <- !is.na(testdf1[1:3])
B <- !is.na(testdf1[4:6])
outer((sa <- seq_len(ncol(A))), (sb <- seq_len(ncol(B))),
Vectorize(\(x, y) sum(A[, x] B[, y] == 2))) |>
`dimnames<-`(list(paste0('a', sa), paste0('b', sb)))
# b1 b2 b3
# a1 1 3 2
# a2 2 1 2
# a3 2 0 1