Home > Back-end >  Numpy - Product of All Unique Combinations or Paths along an axis
Numpy - Product of All Unique Combinations or Paths along an axis

Time:02-18

Given a 2D array, I would like to find the product of all unique "combinations" or "paths" along the last axis.

If the starting array is

[[1, 2, 3, 4, 5, 6],
 [7, 8, 9, 10, 11, 12]]

Then the unique combinations along the last axis would be

[[1, 2, 3, 4, 5, 6],
 [1, 2, 3, 4, 5, 12],
 [1, 2, 3, 4, 11, 6],
 [1, 2, 3, 4, 11, 12],
 ...,
 [7, 8, 9, 10, 11, 6],
 [7, 8, 9, 10, 11, 12]]

And then you can apply numpy.prod(, axis=-1). However, is there a faster/more efficient way to retrieve these unique combinations without manually iterating through?

CodePudding user response:

One solution is to use itertools to generate all the possible indices and then use Numpy indirect indexing so to extract the actual paths:

a = np.array([[1, 2, 3,  4,  5,  6],
              [7, 8, 9, 10, 11, 12]])

rowCount = a.shape[0]**a.shape[1]
x = np.tile(np.arange(a.shape[1]), rowCount).reshape(-1, a.shape[1])
y = np.array(list(itertools.product(*[range(2)]*6)))
result = a[y, x]
product = np.prod(a[y, x], axis=-1)

Note that this solution is not very efficient with bigger arrays. In fact, as said in the comment, the number of possible paths grows exponentially (n**m where n, m = a.shape) and thus the required memory space grow in O(n**m * m). Even the size of the output product grows experientially (n**m).

CodePudding user response:

You can use this generalization of Cartessian product in numpy:

a = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]])
np.stack(np.meshgrid(*a.T), axis=-1).reshape(-1, len(a.T))

Output:

array([[ 1,  2,  3,  4,  5,  6],
       [ 1,  2,  3,  4,  5, 12],
       [ 1,  2,  3,  4, 11,  6],
                            ...
       [ 7,  8,  9, 10,  5, 12],
       [ 7,  8,  9, 10, 11,  6],
       [ 7,  8,  9, 10, 11, 12]])

CodePudding user response:

I would argue that if you only want the final products its better not to generate the complete array of combinations. This method computes the products iteratively which reduces memory usage.

a = np.array([[1, 2, 3,  4,  5,  6],
              [7, 8, 9, 10, 11, 12]])

from itertools import product

m, n = a.shape
combs = product(*[range(m)]*n)
products = [np.product(a[comb, range(n)]) for comb in combs]
print(products)

[720, 1440, 1584, 3168, 1800, 3600, ... etc.

or using np.fromiter

products = np.fromiter((np.product(a[comb, range(n)]) for comb in combs), 
                       dtype=float, count=m**n)

CodePudding user response:

Here is another approach that finds all products recursively based on the number of columns in a. The output order is different from other answers; a benchmark below shows a 50x speedup over the next fastest answer.

def products(arr):
  # if there is one column left, return the last column
  if arr.shape[1] == 1:
    return arr[:,0]
  mid = arr.shape[1] // 2
  # return the flattened outer product of recursive calls
  return np.outer(products(arr[:,:mid]), products(arr[:,mid:])).ravel()

a = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]])
result = products(a)
# [   720   1440   1584   3168   1800   3600   3960   7920   2160   4320
#    4752   9504   5400  10800  11880  23760   2880   5760   6336  12672
#    7200  14400  15840  31680   8640  17280  19008  38016  21600  43200
#   47520  95040   5040  10080  11088  22176  12600  25200  27720  55440
#   15120  30240  33264  66528  37800  75600  83160 166320  20160  40320
#   44352  88704  50400 100800 110880 221760  60480 120960 133056 266112
#  151200 302400 332640 665280]

Benchmark (4 rows, 10 columns):

import itertools

def with_meshgrid(a):  
  b = np.stack(np.meshgrid(*a.T), axis=-1).reshape(-1, len(a.T))
  res = np.prod(b, axis=-1)
  return res

def with_itertools1(a):
  m, n = a.shape
  combs = itertools.product(*[range(m)]*n)
  products = [np.product(a[comb, range(n)]) for comb in combs]
  return products

def with_itertools2(a):
  rowCount = a.shape[0]**a.shape[1]
  x = np.tile(np.arange(a.shape[1]), rowCount).reshape(-1, a.shape[1])
  y = np.array(list(itertools.product(*[range(a.shape[0])]*a.shape[1])))
  result = a[y, x]
  product = np.prod(a[y, x], axis=-1)
  return product

a = np.random.randint(0, 100, size=(4, 10))
# %timeit products(a)        # 3.69 ms
# %timeit with_meshgrid(a)   # 201 ms
# %timeit with_itertools1(a) # timeout
# %timeit with_itertools2(a) # 3.1 s
assert (np.sort(products(b)) == np.sort(with_meshgrid(b))).all()
  • Related