I am trying to simulate the answers to a multi-choice question test (MCQ). Currently, I am using the following code to simulate the answers to a MCQ with only two questions:
answers <- data.frame(
Q1 = sample(LETTERS[1:5],10,replace = T, prob=c(0.1,0.6,0.1,0.1,0.1)),
Q2 = sample(LETTERS[1:5],10,replace = T, prob=c(0.5,0.1,0.1,0.2,0.1)))
The answers B and A are, respectively, the correct answers to Q1 and Q2.
My difficulty is to introduce correlation among the answers to the questions, in the sense that, for instance, a good student tends to select the correct answer to all questions. How can I accomplish that?
CodePudding user response:
You could fill up the data with completely correct answers, assign a level of proficiency to each individual student and then randomly change values in their exams, depending on their proficiency:
correct = c(2,1,3)
nstudents = 20
exam = matrix(LETTERS[rep(correct,nstudents)],ncol=length(correct),byrow=T)
colnames(exam)=paste("Q",1:length(correct),sep="")
proficiency = runif(nstudents,1,5)/5 ## Each student has a level of expertise
for(question in 1:length(correct)){
difficulty = runif(nstudents,1,10)/10 ## Random difficulty for each question and student (may be made more or less difficult)
nmistakes = sum(proficiency<difficulty)
exam[,question][proficiency<difficulty] = sample(LETTERS[1:5],nmistakes,replace=T)
}
exam = as.data.frame(exam)
The result would be a data frame in which some students hardly ever make mistakes while others hardly ever get something right.
EDIT: The proficiency, in this case, follows an uniform distribution. If you need them normally distributed, just change the proficiency
vector to use rnorm()
.
CodePudding user response:
Here is a method that applies a covariance matrix Sigma=
using MASS::mvrnorm
.
n <- 15
r <- .9
set.seed(42)
library('MASS')
M <- abs(mvrnorm(n=n, mu=c(1, 500), Sigma=matrix(c(1, r, r, 1), nrow=2),
empirical=TRUE)) |>
as.data.frame() |>
setNames(c('Q1', 'Q2'))
We get the correlated levels A, ..., B by cut
ting the random numbers along custom quantile
s (taken from OP),
f <- \(x, q) cut(x, breaks=c(0, quantile(x, cumsum(q))), include.lowest=T,
labels=LETTERS[1:5])
p1 <- c(0.1, 0.6, 0.1, 0.1, 0.1)
p2 <- c(0.5, 0.1, 0.1, 0.2, 0.1)
in a Map()
call.
dat <- Map(f, M, list(p1, p2)) |>
as.data.frame()
dat
# Q1 Q2
# 1 A A
# 2 B A
# 3 E E
# 4 D E
# 5 A A
# 6 B A
# 7 C D
# 8 B A
# 9 B A
# 10 B A
# 11 B C
# 12 B B
# 13 E D
# 14 B A
# 15 C D
Check
dat_check <- lapply(dat, as.integer) |> as.data.frame()
cor(dat_check) ## correlation
# Q1 Q2
# Q1 1.00000 0.85426
# Q2 0.85426 1.00000
lapply(dat, table) ## students' answers
# $Q1
#
# A B C D E
# 2 8 2 1 2
#
# $Q2
#
# A B C D E
# 8 1 1 3 2