Given a 2D array, I would like to find the product of all unique "combinations" or "paths" along the last axis.
If the starting array is
[[1, 2, 3, 4, 5, 6],
[7, 8, 9, 10, 11, 12]]
Then the unique combinations along the last axis would be
[[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 12],
[1, 2, 3, 4, 11, 6],
[1, 2, 3, 4, 11, 12],
...,
[7, 8, 9, 10, 11, 6],
[7, 8, 9, 10, 11, 12]]
And then you can apply numpy.prod(, axis=-1)
. However, is there a faster/more efficient way to retrieve these unique combinations without manually iterating through?
CodePudding user response:
One solution is to use itertools
to generate all the possible indices and then use Numpy indirect indexing so to extract the actual paths:
a = np.array([[1, 2, 3, 4, 5, 6],
[7, 8, 9, 10, 11, 12]])
rowCount = a.shape[0]**a.shape[1]
x = np.tile(np.arange(a.shape[1]), rowCount).reshape(-1, a.shape[1])
y = np.array(list(itertools.product(*[range(2)]*6)))
result = a[y, x]
product = np.prod(a[y, x], axis=-1)
Note that this solution is not very efficient with bigger arrays. In fact, as said in the comment, the number of possible paths grows exponentially (n**m
where n, m = a.shape
) and thus the required memory space grow in O(n**m * m)
. Even the size of the output product
grows experientially (n**m
).
CodePudding user response:
You can use this generalization of Cartessian product in numpy
:
a = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]])
np.stack(np.meshgrid(*a.T), axis=-1).reshape(-1, len(a.T))
Output:
array([[ 1, 2, 3, 4, 5, 6],
[ 1, 2, 3, 4, 5, 12],
[ 1, 2, 3, 4, 11, 6],
...
[ 7, 8, 9, 10, 5, 12],
[ 7, 8, 9, 10, 11, 6],
[ 7, 8, 9, 10, 11, 12]])
CodePudding user response:
I would argue that if you only want the final products its better not to generate the complete array of combinations. This method computes the products iteratively which reduces memory usage.
a = np.array([[1, 2, 3, 4, 5, 6],
[7, 8, 9, 10, 11, 12]])
from itertools import product
m, n = a.shape
combs = product(*[range(m)]*n)
products = [np.product(a[comb, range(n)]) for comb in combs]
print(products)
[720, 1440, 1584, 3168, 1800, 3600, ... etc.
or using np.fromiter
products = np.fromiter((np.product(a[comb, range(n)]) for comb in combs),
dtype=float, count=m**n)
CodePudding user response:
Here is another approach that finds all products recursively based on the number of columns in a
. The output order is different from other answers; a benchmark below shows a 50x speedup over the next fastest answer.
def products(arr):
# if there is one column left, return the last column
if arr.shape[1] == 1:
return arr[:,0]
mid = arr.shape[1] // 2
# return the flattened outer product of recursive calls
return np.outer(products(arr[:,:mid]), products(arr[:,mid:])).ravel()
a = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]])
result = products(a)
# [ 720 1440 1584 3168 1800 3600 3960 7920 2160 4320
# 4752 9504 5400 10800 11880 23760 2880 5760 6336 12672
# 7200 14400 15840 31680 8640 17280 19008 38016 21600 43200
# 47520 95040 5040 10080 11088 22176 12600 25200 27720 55440
# 15120 30240 33264 66528 37800 75600 83160 166320 20160 40320
# 44352 88704 50400 100800 110880 221760 60480 120960 133056 266112
# 151200 302400 332640 665280]
Benchmark (4 rows, 10 columns):
import itertools
def with_meshgrid(a):
b = np.stack(np.meshgrid(*a.T), axis=-1).reshape(-1, len(a.T))
res = np.prod(b, axis=-1)
return res
def with_itertools1(a):
m, n = a.shape
combs = itertools.product(*[range(m)]*n)
products = [np.product(a[comb, range(n)]) for comb in combs]
return products
def with_itertools2(a):
rowCount = a.shape[0]**a.shape[1]
x = np.tile(np.arange(a.shape[1]), rowCount).reshape(-1, a.shape[1])
y = np.array(list(itertools.product(*[range(a.shape[0])]*a.shape[1])))
result = a[y, x]
product = np.prod(a[y, x], axis=-1)
return product
a = np.random.randint(0, 100, size=(4, 10))
# %timeit products(a) # 3.69 ms
# %timeit with_meshgrid(a) # 201 ms
# %timeit with_itertools1(a) # timeout
# %timeit with_itertools2(a) # 3.1 s
assert (np.sort(products(b)) == np.sort(with_meshgrid(b))).all()