Numpy - Product of All Unique Combinations or Paths along an axis-CodePudding

Given a 2D array, I would like to find the product of all unique "combinations" or "paths" along the last axis.

If the starting array is

[[1, 2, 3, 4, 5, 6],
 [7, 8, 9, 10, 11, 12]]

Then the unique combinations along the last axis would be

[[1, 2, 3, 4, 5, 6],
 [1, 2, 3, 4, 5, 12],
 [1, 2, 3, 4, 11, 6],
 [1, 2, 3, 4, 11, 12],
 ...,
 [7, 8, 9, 10, 11, 6],
 [7, 8, 9, 10, 11, 12]]

And then you can apply numpy.prod(, axis=-1). However, is there a faster/more efficient way to retrieve these unique combinations without manually iterating through?

CodePudding user response：

One solution is to use itertools to generate all the possible indices and then use Numpy indirect indexing so to extract the actual paths:

a = np.array([[1, 2, 3,  4,  5,  6],
              [7, 8, 9, 10, 11, 12]])

rowCount = a.shape[0]**a.shape[1]
x = np.tile(np.arange(a.shape[1]), rowCount).reshape(-1, a.shape[1])
y = np.array(list(itertools.product(*[range(2)]*6)))
result = a[y, x]
product = np.prod(a[y, x], axis=-1)

Note that this solution is not very efficient with bigger arrays. In fact, as said in the comment, the number of possible paths grows exponentially (n**m where n, m = a.shape) and thus the required memory space grow in O(n**m * m). Even the size of the output product grows experientially (n**m).

CodePudding user response：

You can use this generalization of Cartessian product in numpy:

a = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]])
np.stack(np.meshgrid(*a.T), axis=-1).reshape(-1, len(a.T))

Output:

array([[ 1,  2,  3,  4,  5,  6],
       [ 1,  2,  3,  4,  5, 12],
       [ 1,  2,  3,  4, 11,  6],
                            ...
       [ 7,  8,  9, 10,  5, 12],
       [ 7,  8,  9, 10, 11,  6],
       [ 7,  8,  9, 10, 11, 12]])

CodePudding user response：

I would argue that if you only want the final products its better not to generate the complete array of combinations. This method computes the products iteratively which reduces memory usage.

a = np.array([[1, 2, 3,  4,  5,  6],
              [7, 8, 9, 10, 11, 12]])

from itertools import product

m, n = a.shape
combs = product(*[range(m)]*n)
products = [np.product(a[comb, range(n)]) for comb in combs]
print(products)

[720, 1440, 1584, 3168, 1800, 3600, ... etc.

or using np.fromiter

products = np.fromiter((np.product(a[comb, range(n)]) for comb in combs), 
                       dtype=float, count=m**n)

CodePudding user response：

Here is another approach that finds all products recursively based on the number of columns in a. The output order is different from other answers; a benchmark below shows a 50x speedup over the next fastest answer.

def products(arr):
  # if there is one column left, return the last column
  if arr.shape[1] == 1:
    return arr[:,0]
  mid = arr.shape[1] // 2
  # return the flattened outer product of recursive calls
  return np.outer(products(arr[:,:mid]), products(arr[:,mid:])).ravel()

a = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]])
result = products(a)
# [   720   1440   1584   3168   1800   3600   3960   7920   2160   4320
#    4752   9504   5400  10800  11880  23760   2880   5760   6336  12672
#    7200  14400  15840  31680   8640  17280  19008  38016  21600  43200
#   47520  95040   5040  10080  11088  22176  12600  25200  27720  55440
#   15120  30240  33264  66528  37800  75600  83160 166320  20160  40320
#   44352  88704  50400 100800 110880 221760  60480 120960 133056 266112
#  151200 302400 332640 665280]

Benchmark (4 rows, 10 columns):

import itertools

def with_meshgrid(a):  
  b = np.stack(np.meshgrid(*a.T), axis=-1).reshape(-1, len(a.T))
  res = np.prod(b, axis=-1)
  return res

def with_itertools1(a):
  m, n = a.shape
  combs = itertools.product(*[range(m)]*n)
  products = [np.product(a[comb, range(n)]) for comb in combs]
  return products

def with_itertools2(a):
  rowCount = a.shape[0]**a.shape[1]
  x = np.tile(np.arange(a.shape[1]), rowCount).reshape(-1, a.shape[1])
  y = np.array(list(itertools.product(*[range(a.shape[0])]*a.shape[1])))
  result = a[y, x]
  product = np.prod(a[y, x], axis=-1)
  return product

a = np.random.randint(0, 100, size=(4, 10))
# %timeit products(a)        # 3.69 ms
# %timeit with_meshgrid(a)   # 201 ms
# %timeit with_itertools1(a) # timeout
# %timeit with_itertools2(a) # 3.1 s
assert (np.sort(products(b)) == np.sort(with_meshgrid(b))).all()