If I have a NumPy array [a b c d], is there a way to get the sum of the multiplication of each element with all other elements excluding itself and without duplicates?
For example, the sum for this array would be: ab ac ad bc bd cd.
Using for loops is an option, but it wouldn't be very efficient when you have very large arrays. So I was wondering if there is a more efficient way in NumPy.
CodePudding user response:
Using math, you can refactor the sum of product into a product of sums.
a*b a*c a*d b*c b*d c*d
(a b c)*d (a b)*c a*b
This gives you the following:
out = (a[1:]*np.cumsum(a[:-1])).sum()
Example:
a = np.array([3, 6, 4, 9, 10])
np.triu(a*a[:,None], k=1).sum()
# 391
(a[1:]*np.cumsum(a[:-1])).sum()
# 391
CodePudding user response:
Here's a test of @mozwy's idea of using the upper triangle.
A sample array:
In [11]: x = np.array([1,2,3,4]); a,b,c,d = x
the sum of products:
In [12]: a*b a*c a*d b*c b*d c*d
Out[12]: 35
The upper tri:
In [13]: A = np.triu(np.ones((4,4),int),k=1)
In [14]: A
Out[14]:
array([[0, 1, 1, 1],
[0, 0, 1, 1],
[0, 0, 0, 1],
[0, 0, 0, 0]])
And matrix product:
In [15]: x@A@x
Out[15]: 35
Another approach, starting with the full outer product:
In [17]: outer = x[:,None]*x
In [18]: outer
Out[18]:
array([[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12],
[ 4, 8, 12, 16]])
Remove the trace, and you are left with twice the desired result:
In [19]: np.trace(outer) # or x@x
Out[19]: 30
In [20]: (np.sum(outer)-np.trace(outer))/2
Out[20]: 35.0
equivalently using einsum
:
In [25]: (np.einsum('i,j->',x,x)-np.einsum('i,i',x,x))/2
Out[25]: 35.0
With numpy
it is easy, and relatively efficient, to do the full sum of outer products; "vectorization" means doing the whole-array calculations in compiled code, even if means some "extra" work and memory use.