Home > Back-end >  Finding the cartesian product of 2d arrays of different shapes and horizontally concatenating them r
Finding the cartesian product of 2d arrays of different shapes and horizontally concatenating them r

Time:09-28

I am somewhat new to numpy and am having trouble figuring out a nice way to efficiently perform what I assume is likely a simple task. I am suspicious there is a direct way to do this in numpy, but having searched quite a bit could not find anything that does it directly.

I have two 2D arrays, like so:

>>> ident2 = np.identity(2)
>>> ident3 = np.identity(3)
>>> ident2
array([[1., 0.],
       [0., 1.]])
>>> ident3
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

What I would like to create is an array like this, which is the cartesian product of the two arrays above but concatenated along the rows:

array([[1, 0, 0, 1, 0],
       [1, 0, 0, 0, 1],
       [0, 1, 0, 1, 0],
       [0, 1, 0, 0, 1],
       [0, 0, 1, 1, 0],
       [0, 0, 1, 0, 1]])

So far I have been able to create the cartesian product using itertools.product like this:

>>> x=np.array([*itertools.product(ident2, ident3)])
>>> x
array([[array([1., 0.]), array([1., 0., 0.])],
       [array([1., 0.]), array([0., 1., 0.])],
       [array([1., 0.]), array([0., 0., 1.])],
       [array([0., 1.]), array([1., 0., 0.])],
       [array([0., 1.]), array([0., 1., 0.])],
       [array([0., 1.]), array([0., 0., 1.])]], dtype=object)

But I am having trouble figuring out a readable, efficient way to join the arrays along the rows into a final array. This works:

>>> np.stack([np.concatenate(arrays) for arrays in x])
array([[1., 0., 0., 1., 0.],
       [1., 0., 0., 0., 1.],
       [0., 1., 0., 1., 0.],
       [0., 1., 0., 0., 1.],
       [0., 0., 1., 1., 0.],
       [0., 0., 1., 0., 1.]])

The above is very readable, but since it does not use only native numpy methods and uses a list comprehension, I assume it will be slow.

Below is the only method I've found that works without using a list comprehension:

>>> np.stack(np.array_split(np.hstack(np.concatenate(x)), 6))
array([[1., 0., 0., 1., 0.],
       [1., 0., 0., 0., 1.],
       [0., 1., 0., 1., 0.],
       [0., 1., 0., 0., 1.],
       [0., 0., 1., 1., 0.],
       [0., 0., 1., 0., 1.]])

But it is extremely convoluted. How in the world can future me ever come back to read that and understand what in the world is going on? And it also requires the separate, initial itertools.product step, which I am assuming a more efficient native numpy method would probably not require.

There has to be a better way. What would be the canonical way to construct the row-by-row concatenated cartesian product of these two 2D arrays?

CodePudding user response:

How about using a mix of repeat and tile (which itself uses repeat):

In [75]: >>> ident2 = np.identity(2)
    ...: >>> ident3 = np.identity(3)
In [76]: np.repeat(ident3,repeats=2,axis=0)
Out[76]: 
array([[1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 1.]])
In [77]: np.tile(ident2,(3,1))
Out[77]: 
array([[1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.]])
In [78]: np.hstack((__,_))
Out[78]: 
array([[1., 0., 0., 1., 0.],
       [1., 0., 0., 0., 1.],
       [0., 1., 0., 1., 0.],
       [0., 1., 0., 0., 1.],
       [0., 0., 1., 1., 0.],
       [0., 0., 1., 0., 1.]])
  • Related