I have a set of numbers n_1, n_2, ... n_k
. I need to find the sum (or mean, its the same) of the logsumexp
of all possible subsets of this set of numbers. Is there a way to approximate this or compute it exactly?
Note, the logsumexp
of a, b, c
is log(e_a e_b e_c)
(exp, followed by sum, followed by log)
CodePudding user response:
I don’t know if this will be accurate enough, but log sum exp is sort of a smooth analog of max, so one possibility would be to sort so that n1 ≥ n2 ≥ … ≥ nk and return ∑i (2k−i / (2k − 1)) ni, which under-approximates the true mean by an error term between 0 and log k.
You could also use a sample mean of the log sum exp of {ni} ∪ (a random subset of {ni 1, ni 2, …, nk}) instead of ni in the sum. By using enough samples, you can make the approximation as good as you like (though obviously at some point it’s cheaper to evaluate with brute force).
(I’m assuming that the empty set is omitted.)
CodePudding user response:
On a different tack, here’s a deterministic scheme whose additive error is at most ε. As in my other answer, we sort n1 ≥ … ≥ nk, define an approximation f(i) of the mean log sum exp over nonempty subsets where the minimum index is i, and evaluate ∑i (2k−i / (2k − 1)) f(i). If each f(i) is within ε, then so is the result.
Fix i and for 1 ≤ j ≤ k−i define dj = ni j − ni. Note that dj ≤ 0. We define f(i) = ni g(i) where g(i) will approximate the mean log one plus sum exp of subsets of {dj | j}.
I don’t want to write the next part formally since I believe that it will be harder to understand, so the idea is that with log sum exp being commutative and associative, we’re going to speed up the brute force algorithm that initializes the list of results as [0] and then, for each j, doubles the size of the list the list by appending the log sum exp with dj of each of its current elements. I claim that, if we round each intermediate result down to the nearest multiple of δ = ε/k, then the result will under-approximate at most ε. By switching from the list to its histogram, we can do each step in time proportional to the number of distinct entries. Finally, we set g(i) to the histogram average.
To analyze the running time, in the worst case, we have dj = 0 for all j, making the largest possible result log k. This means that the list can have at most (log k)/δ = ε−1 k log k 1 entries, making the total running time O(ε−1 k3 log k). (This can undoubtedly be improved slightly with a faster convolution algorithm and/or by rounding the dj and taking advantage.)