Calculate average value of all sorted array combinations-CodePudding

I need to get the statistical expected value of a n choose k drawing in a sorted array.

As an example, let's consider I want to choose 2 elements from the following sorted array

[1, 2, 3]

The set of all possible combinations is the following:

(1, 2)
(1, 3)
(2, 3)

So the expected value of the first element is (1 1 2) / 3 = 1.33, and the expected value of the second element is (2 3 3) = 2.67

Here is a function that works with a bruteforce approach for doing that, but it is too slow to be used on large arrays. Is there a smarter/faster way?

import itertools
import math
def combinations_expected_value(arr, k):
    sums = [0] * k
    l = math.comb(len(arr), k)
    for comb in itertools.combinations(arr, k):
        for i in range(k):
            sums[i]  = comb[i]
    
    return [sums[i] / l for i in range(k)]

Thank you!

CodePudding user response：

For each position in the combination, the possible values are a subset of the list starting at the position and up to the last k-p-1 element. e.g. for combinations of 6 in 1..100, position 3 can only contain values 3..96

For each of the positon/value pairs, the number of occurrences will be the product of combinations of left side elements and combinations of right side elements.

For example, for combinations of 6 elements within a list of 1..100, the number of times 45 will appear at the third position is the combinations of 2 in 1..44 times the combinations of 3 in 46..100. So we will have C(44,2) * C(55,3) * 45 for that positon/value pair.

You can repeat this calculation for each positon/value pair to obtain a total for each position in the output combinations. Then divide these totals by the number of combinations to get the expected value:

from math import comb

def countComb(N,k):
    result = [0]*k
    for p in range(k):                 # p is count on the left
        q = k-p-1                      # q is count on the right
        for i in range(p,len(N)-q):
            left  = comb(i,p)          # combinations on the left >= 1
            right = comb(len(N)-i-1,q) # combinations on the right >= 1
            result[p]  = left * right * N[i]
    return result
            
def combProb(N,k):
    Cnk = comb(len(N),k)
    return [S/Cnk for S in countComb(N,k)]

Output:

print(countComb([1,2,3],2)) # [4, 8]
print(combProb([1,2,3],2))  # [1.3333333333333333, 2.6666666666666665]

print(countComb([1,2,3,4,5],3)) # [15, 30, 45]
print(combProb([1,2,3,4,5],3))  # [1.5, 3.0, 4.5]

# test with large number of combinations:

print(countComb(list(range(1,301)),7))
[1521500803497675, 3043001606995350, 4564502410493025, 
 6086003213990700, 7607504017488375, 9129004820986050,
 10650505624483725]

print(combProb(list(range(1,301)),7))
[37.625, 75.25, 112.875, 150.5, 188.125, 225.75, 263.375]