Stacking arrays with one different dimension using numpy-CodePudding

Let's say I have 2 arrays:

a = np.array([[[1],[1]],[[2],[2]],[[3],[3]],[[4],[4]]])
b = np.array(["a", "b", "c", "d"]).reshape(4,1,1)

Then a.shape = (4,2,1) and b.shape=(4,1,1). My desired output would look like this

c = np.array([
[
    np.array([[1],[1]]), "a"
],
[
    np.array([[2],[2]]), "b"
],
[
    np.array([[3],[3]]), "c"
],
[
    np.array([[4],[4]]), "d"
],
])

I tried np.hstack an np.concatenate, but that doesn't quite do what I want. I realize I can simply loop through a and b and create the array, I am simply wondering if there is a specific function, which would return array c or if loop is my best bet here.

CodePudding user response：

you can iterate in conjuntion with zip and a one-liner

a = np.array([[[1],[1]],[[2],[2]],[[3],[3]],[[4],[4]]])
b = np.array(["a", "b", "c", "d"])
c = np.array([[aa,bb] for aa,bb in zip(a,b)])

CodePudding user response：

The closest I can get is using np.column_stack([a, b]). However, this will include the string characters in the numpy array, instead of outside as you have.

>>> np.column_stack([a,b])
array([[['1'],
        ['1'],
        ['a']],

       [['2'],
        ['2'],
        ['b']],

       [['3'],
        ['3'],
        ['c']],

       [['4'],
        ['4'],
        ['d']]], dtype='<U11')

CodePudding user response：

First remove the unnecessary dimensions from b:

In [203]: b1 = b.ravel()

Second make a 4 element object dtype array from a:

In [204]: a1 = np.empty(4, object)
In [205]: a1[:] = list(a)
In [206]: a1
Out[206]: 
array([array([[1],
              [1]]), array([[2],
                            [2]]), array([[3],
                                          [3]]), array([[4],
                                                        [4]])],
      dtype=object)

Now we can stack them on a new 2nd axis to make a (4,2) array:

In [207]: np.stack((a1,b1), axis=1)
Out[207]: 
array([[array([[1],
               [1]]), 'a'],
       [array([[2],
               [2]]), 'b'],
       [array([[3],
               [3]]), 'c'],
       [array([[4],
               [4]]), 'd']], dtype=object)

The list comprehension is simpler, and probably faster:

In [209]: [[i,j.item()] for i,j in zip(a,b)]
Out[209]: 
[[array([[1],
         [1]]),
  'a'],
 [array([[2],
         [2]]),
  'b'],
 [array([[3],
         [3]]),
  'c'],
 [array([[4],
         [4]]),
  'd']]

Or make a structured array with 2 fields:

In [215]: c = np.zeros(4, dtype='O,U1')
In [216]: c
Out[216]: 
array([(0, ''), (0, ''), (0, ''), (0, '')],
      dtype=[('f0', 'O'), ('f1', '<U1')])
In [218]: c['f0'] = list(a)
In [219]: c['f1'] = b.ravel()
In [220]: c
Out[220]: 
array([(array([[1],
              [1]]), 'a'), (array([[2],
                                  [2]]), 'b'), (array([[3],
                                                      [3]]), 'c'),
       (array([[4],
              [4]]), 'd')], dtype=[('f0', 'O'), ('f1', '<U1')])

In [221]: c = np.zeros(4, dtype=[('a',int,(2,1)),('b','U1')])
In [222]: c['a']=a    
In [224]: c['b']=b.ravel()
In [225]: c
Out[225]: 
array([([[1], [1]], 'a'), ([[2], [2]], 'b'), ([[3], [3]], 'c'),
       ([[4], [4]], 'd')], dtype=[('a', '<i8', (2, 1)), ('b', '<U1')])

The key to all these methods is understanding what you are trying to create. You aren't making a "normal" multidimensional array. You are trying to mix arrays and strings.

You can concatenate to make a (4,3,1) - and preserve ints by specifying the dtype. This joins a (4,2,1) with a (4,1,1) on the middle dimension. That's what concatenate (and the stack derivatives) does best.

In [263]: np.concatenate((a,b),axis=1, dtype=object)
Out[263]: 
array([[[1],
        [1],
        ['a']],

       [[2],
        [2],
        ['b']],

       [[3],
        [3],
        ['c']],

       [[4],
        [4],
        ['d']]], dtype=object)