Home > database >  Use numpy to stack combinations of a 1D and 2D array
Use numpy to stack combinations of a 1D and 2D array

Time:02-25

I have 2 numpy arrays, one 2D and the other 1D, for example like this:

import numpy as np

a = np.array(
    [
        [1, 2],
        [3, 4],
        [5, 6]
    ]
)

b = np.array(
    [7, 8, 9, 10]
)

I want to get all possible combinations of the elements in a and b, treating a like a 1D array, so that it leaves the rows in a intact, but also joins the rows in a with the items in b. It would look something like this:

>>> combine1d(a, b)
[ [1 2 7] [1 2 8] [1 2 9] [1 2 10] 
  [3 4 7] [3 4 8] [3 4 9] [3 4 10]
  [5 6 7] [5 6 8] [5 6 9] [5 6 10] ]

I know that there are slow solutions for this (like a for loop), but I need a fast solution to this as I am working with datasets with millions of integers.
Any ideas?

CodePudding user response:

This is one of those cases where it's easier to build a higher dimensional object, and then fix the axes when you're done. The first two dimensions are the length of b and the length of a. The third dimension is the number of elements in each row of a plus 1. We can then use broadcasting to fill in this array.

x, y = a.shape
z, = b.shape
result = np.empty((z, x, y   1))
result[...,:y] = a
result[...,y] = b[:,None]

At this point, to get the exact answer you asked for, you'll need to swap the first two axes, and then merge those two axes into a single axis.

result.swapaxes(0, 1).reshape(-1, y   1)

CodePudding user response:

this is very "scotch tape" solution:

import numpy as np

a = np.array(
    [
        [1, 2],
        [3, 4],
        [5, 6]
    ]
)

b = np.array(
    [7, 8, 9, 10]
)

z = []

for x in b:
 for y in a:
  z.append(np.append(y, x))

np.array(z).reshape(3, 4, 3)

CodePudding user response:

You need to use np.c_ to attach to join two dataframe. I also used np.full to generate a column of second array (b). The result are like what follows:

result = [np.c_[a, np.full((a.shape[0],1), x)] for x in b]
result

Output

[array([[1, 2, 7],
        [3, 4, 7],
        [5, 6, 7]]),

 array([[1, 2, 8],
        [3, 4, 8],
        [5, 6, 8]]),

 array([[1, 2, 9],
        [3, 4, 9],
        [5, 6, 9]]),

 array([[ 1,  2, 10],
        [ 3,  4, 10],
        [ 5,  6, 10]])]

The output might be kind of messy. But it's exactly like what you mentioned as your desired output. To make sure, you cun run below to see what comes from the first element in the result array:

print(result[0])

Output

array([[1, 2, 7],
       [3, 4, 7],
       [5, 6, 7]])
  • Related