Fastest way to count the number of times each face of a N-sided dice appears for M rolls-CodePudding

Given M rolls of an N sided dice I want to generate an N array which stores the count for the number of times each face of the dice appears.

For example, for a 6 sided dice, for 10 rolls you might get:

[2, 3, 2, 1, 1, 1]

AKA 2 1's, 3 2's, 2 3's, 1 4, 1 5, 1 6.

The obvious first solution is to just generate M random numbers and count them:

Random r = new Random();
int[] counts = new int[6] { 0, 0, 0, 0, 0, 0 };
for (int i = 0; i < 10; i  )   counts[(int)Math.Round(r.NextDouble() * 5)];

I was wondering if there was anything faster. A thought I had was to start from the expected counts and apply some random "shuffling":

For 60 rolls of a 6 sided dice, you expect a count of:

[10, 10, 10, 10, 10, 10]

You could then randomly select 2 indexes and add 1 and subtract 1:

[11, 10, 9, 10, 10, 10]

Then repeat this X times. The problem with this is it doesn't actually give you a proper count, it just gives you a convincing one at large M. Plus, how would you pick X?

Another thought was to imagine a line of length M and pick N splits in it to generate your groups:

9 rolls:

0 1 2 3 4 5 6 7 8 9
|-----------------|
|   | |   | |     |
1   2 3   4 5     6

[2-0, 3-2, 5-3, 6-5, 9-6, 9-9]
[ 2 ,  1 ,  2 ,  1 ,  3 ,  0 ]

The problem with this method is it doesn't properly represent the probability created by dice rolls. For example, the configuration [3, 0, 0, 1, 1, 1] from 6 dice rolls created by this method has a different probability of showing up than you actually rolling 3 1's, 0 2's, 0 3's, 1 4, 1 5 and 1 6.

Maybe it's just not possible to do this any faster than just performing the rolls and counting.

CodePudding user response：

Thanks to Robert Dodier for pointing out I can just use a multinomial distribution!

In python others can look at https://numpy.org/doc/stable/reference/random/generated/numpy.random.multinomial.html

np.random.multinomial(20, [1/6.]*6, size=1)

There is various literature on how to implement this if you cannot use numpy, but for me the numpy library is enough and it runs blazing fast.

Here are some links though for those that want to look into implementation:

CodePudding user response：

Disclaimer: the below doesn't currently provide enough information to improve upon your most efficient solution, as I'm not aware of the exact mathematical solution I'd need to complete it, but I wanted to post it anyway in case someone else can see the discussion through to completion.

For your line-of-length-M solution, delineating each group amounts to generating a random number between [0, 1) and finding where it falls within the binomial cumulative distribution. For example, if you had a 3-sided die and four rolls, here's how you'd go about determining where to split the groups:

To simulate the number of times a "1" was rolled, generate a first random number between [0, 1) -- let's say you get 0.67. The binomial cumulative distribution for p=1/3, n=4 is:

Number of successes	Discrete probability	Cumulative probability
0	(1/3)^0 * (2/3)^4 * (4C0) ≈ 0.20	0.20
1	(1/3)^1 * (2/3)^3 * (4C1) ≈ 0.40	0.60
2	(1/3)^2 * (2/3)^2 * (4C2) ≈ 0.29	0.89
3	(1/3)^3 * (2/3)^1 * (4C3) ≈ 0.10	0.99
4	(1/3)^4 * (2/3)^0 * (4C4) ≈ 0.01	1

Since your randomly generated 0.67 falls between immediately below 0.89 in the last column, you've simulated 2 successes--i.e., you've rolled a "1" two times.

To simulate the number of times a "2" was rolled, repeat the same procedure with another random number, this time using the binomial cumulative distribution still for p=1/3, but now for n=2, as you only have two out of four rolls remaining.
Repeat for all sides of the die; the final side is rolled all the remaining times.

Calculating each entire binomial cumulative distribution table is O(M) (per table, so O(M*N) total--without even taking into account the math involved), so it doesn't help efficiency if you actually have to produce the complete tables. If there's an efficient way to take the number 0.67 and know that it corresponds to 2 successes, then this solution might have some merit.