How to create a Test of 20 questions from 11 chapters with different number of questions?-CodePudding

I am looking for an advise from people because I think my method may not be the best one.

I have 11 chapters, where each chapter has different number of questions (900 questions in total). A distribution looks like below

Chapter 1 ➝ 3 questions
Chapter 2➝ 23 questions
Chapter 3 ➝ 70 questions
Chapter 4 ➝ 260 questions
Chapter 5➝ 60 questions
Chapter 6 ➝ 90 questions
Chapter 7 ➝ 61 questions
Chapter 8 ➝ 23 questions
Chapter 9 ➝ 97 questions
Chapter 10 ➝ 19 questions
Chapter 11 ➝ 194 questions

Now, I want to create 45 tests where each test contains 20 questions. My intention in test creation are

A question should not appear in more than 1 test. So essentially every test is unique.
The test should be truly random. It should not happen that a test contains all or most questions from a single chapter. It must include questions from other chapters as well.
It is not mandatory that a test contains one or more questions from all chapters.

This will help ensure that all questions are covered in the tests and for a given test, the questions come from many chapters, and not a single chapter.

My current thinking in this direction is following

Put all questions from all chapters in an array. This will make up and array of 900 questions.
Randomly shuffle this array.
Split the resultant array in 45 arrays, each with 20 questions (45*20 = 900 questions).

I want to know if this approach help achieve the goals, or there is a better way to approach this problem.

Thank you in advance

CodePudding user response：

Truly random creates patterns that do not look random to us. I would therefore advocate a method that spreads out neighboring questions artificially evenly between the tests.

The following approach will make tests that look more random.

from random import random

def distribute (questions, test_count):
    tests = [[] for i in range(test_count)]
    weights = [0.5 for i in range(test_count)]
    for q in questions:
        total_weight = 0
        for i in range(test_count):
            weights[i]  = 1/test_count
            if 0 < weights[i] and len(tests[i]) < len(questions) / test_count:
                total_weight  = weights[i]

        r = random()
        choice = None
        for i in range(test_count):
            if 0 < weights[i] and len(tests[i]) < len(questions) / test_count:
                r -= weights[i] / total_weight
                if r < 0:
                    choice = i
                    break

        if choice is None: # Weird roundoff error, we hope
            choice = test_count - 1

        tests[i].append(q)
        weights[i] -= 1

    return tests

With your approach and that distribution of questions, odds are good that at least one test will have over half of its questions from chapter 4. Even though that test is under 30% of the total available test questions.

Why? Consider a biased coin with odds 260/900 of giving a 1. This is a random variable with average 260/900 = 0.28888... and variance 260/900 * (1 - 260/900) = 0.2054320987.... They aren't quite independent, but close enough that the sum of 20 of them will look a lot like a normal distribution with 5.7777777... and variance 4.1086419753... for a standard deviation of 2.02697853350958.... With odds fairly close to 1/45, this should land more than 2.33 standard deviations away at 10.5 , which corresponds in the normal approximation to 11 questions from chapter 4.

With my approach, having more than 9 from that chapter is impossible, and having more than 7 is very unlikely.

CodePudding user response：

Here's another approach you can take. I'm loosely interpreting your goals to mean you'd like each chapter's questions to be well distributed among the tests, and each test to include a wide variety of chapters. I'm deliberately ignoring the requirement that it be "truly random", but leaning more toward a good mix of chapters, and variety of placements of questions within each chapter-- so we don't cluster the early questions with the first tests, and the later questions with the last tests, or a long sequence of a chapter's questions to one test, etc.

public Test[] generateTests() {
    int testCt=45;
    int skip=17; // Pick a prime number larger than your number of chapters.
    // This is an array of chapters, each holding a list of questions.
    List<List<Question>> questions=getQuestions();
    Test[] tests=newTests(testCt);
    int chapterLength=questions.size();
    // This array holds indexes referencing a question in each chapter
    int[] qi=new int[chapterLength];
    int chapter=1;
    for (int i=0; i<900; i  ) {
        do { // Find a chapter with undistributed questions
            chapter = (chapter   skip) % chapterLength;
        } while (qi[chapter]<questions.get(chapter).size());
        // Distribute that question to a test
        tests[i%testCt].add(questions.get(chapter).get(qi[chapter]  ));
    }
    return tests;
}

The above code works by using a skip value that's co-prime to the number of chapters (even as the number of usable chapters shrinks). This guarantees that as you skip forward to the next question, you'll cycle through every chapter each time.

While you can probably infer these next bits of support code pretty easily, I'm including them for reference.

    private class Question {
        public int chapter;
        public String q;
        public String a;
    }

    private class Test {
        public int id;
        public List<Question> questions=new ArrayList<>();

        public Test(int id) {
            this.id=id;
        }
        public void add(Question q) {
            questions.add(q);
        }
    }

    private List<List<Question>> getQuestions() {
        List<List<Question>> questions=new ArrayList<>();
        // Fill array with 11 chapters, each containing its associated questions.
        return questions;
    }

    private Test[] newTests(int ct) {
        Test[] tests=new Test[ct];
        for (int i=0; i<ct; i  ) {
            tests[i]=new Test(i);
        }
        return tests;
    }

CodePudding user response：

That approach will not always satisfy the second criterion, since a test might easily draw more than 10 questions from a single large chapter like Chapter 4.

I suggest you create 45 empty tests, then for each chapter, shuffle the tests, and distribute the questions as if dealing cards: the first question to the first test, the second question to the second test, looping back to the first test if necessary, and removing a test from the cycle when it has 20 questions.