I need to get the statistical expected value of a n choose k drawing in a sorted array.
As an example, let's consider I want to choose 2 elements from the following sorted array
[1, 2, 3]
The set of all possible combinations is the following:
(1, 2)
(1, 3)
(2, 3)
So the expected value of the first element is (1 1 2) / 3 = 1.33
, and the expected value of the second element is (2 3 3) = 2.67
Here is a function that works with a bruteforce approach for doing that, but it is too slow to be used on large arrays. Is there a smarter/faster way?
import itertools
import math
def combinations_expected_value(arr, k):
sums = [0] * k
l = math.comb(len(arr), k)
for comb in itertools.combinations(arr, k):
for i in range(k):
sums[i] = comb[i]
return [sums[i] / l for i in range(k)]
Thank you!
CodePudding user response:
For each position in the combination, the possible values are a subset of the list starting at the position and up to the last k-p-1 element. e.g. for combinations of 6 in 1..100, position 3 can only contain values 3..96
For each of the positon/value pairs, the number of occurrences will be the product of combinations of left side elements and combinations of right side elements.
For example, for combinations of 6 elements within a list of 1..100, the number of times 45 will appear at the third position is the combinations of 2 in 1..44 times the combinations of 3 in 46..100. So we will have C(44,2) * C(55,3) * 45 for that positon/value pair.
You can repeat this calculation for each positon/value pair to obtain a total for each position in the output combinations. Then divide these totals by the number of combinations to get the expected value:
from math import comb
def countComb(N,k):
result = [0]*k
for p in range(k): # p is count on the left
q = k-p-1 # q is count on the right
for i in range(p,len(N)-q):
left = comb(i,p) # combinations on the left >= 1
right = comb(len(N)-i-1,q) # combinations on the right >= 1
result[p] = left * right * N[i]
return result
def combProb(N,k):
Cnk = comb(len(N),k)
return [S/Cnk for S in countComb(N,k)]
Output:
print(countComb([1,2,3],2)) # [4, 8]
print(combProb([1,2,3],2)) # [1.3333333333333333, 2.6666666666666665]
print(countComb([1,2,3,4,5],3)) # [15, 30, 45]
print(combProb([1,2,3,4,5],3)) # [1.5, 3.0, 4.5]
# test with large number of combinations:
print(countComb(list(range(1,301)),7))
[1521500803497675, 3043001606995350, 4564502410493025,
6086003213990700, 7607504017488375, 9129004820986050,
10650505624483725]
print(combProb(list(range(1,301)),7))
[37.625, 75.25, 112.875, 150.5, 188.125, 225.75, 263.375]