Convert array of 2d-arrays to one 3d-array-CodePudding

I have a numpy array of 2d-numpy arrays (an array of word-embeddings of sentences) e.g

embedded_sentences = 
np.array([
   np.array([[1,2,3],[0,0,0]]),
   np.array([[10,20,30],[40,50,60]])

])

I need to convert this to a 3d-array to this

np.array([
   [[1,2,3],[0,0,0]],
   [[10,20,30],[40,50,60]]

])

i.e from shape(2,) (two sentences) to shape(2,2,3) (two sentences with two words where each word is embedded in 3 dimensions)

such that I can use pytorch to convert it using torch.from_numpy(embedded_sentences).

I have tried np.vstack(embedded_sentences) and np.dstack(embedded_sentences) but that does not seem to do the trick.

EDIT

I can do it this way

embedded_sentences = np.dstack(embedded_sentences)
embedded_sentences = np.transpose(embedded_sentences,(2,0,1))

which works, but aint pretty.

CodePudding user response：

A copy-n-paste of your code produces a 3d array right away.

But if we go the extra step of making sure it's a 1d object dtype array:

In [16]: x = np.empty(2,object); x[:] = list(embedded_sentences)
In [17]: x
Out[17]: 
array([array([[1, 2, 3],
              [0, 0, 0]]), array([[10, 20, 30],
                                  [40, 50, 60]])], dtype=object)

That's effectively a list of 2 arrays. We can join those in the same way as np.array does using np.stack:

In [18]: np.stack(x)
Out[18]: 
array([[[ 1,  2,  3],
        [ 0,  0,  0]],

       [[10, 20, 30],
        [40, 50, 60]]])

I probably should have insisted on seeing what's wrong with the alternatives you tried, such as vstack and dstack:

In [19]: np.vstack(x)
Out[19]: 
array([[ 1,  2,  3],
       [ 0,  0,  0],
       [10, 20, 30],
       [40, 50, 60]])

In [20]: np.dstack(x)
Out[20]: 
array([[[ 1, 10],
        [ 2, 20],
        [ 3, 30]],

       [[ 0, 40],
        [ 0, 50],
        [ 0, 60]]])

As you found the dstack can be transposed. The vstack can be reshaped.

All these stack functions tweak some dimensions and do a np.concatenate.

CodePudding user response：

Your input already has desired shape and elements. Please see below code snippet:

In [1]: import numpy as np

In [2]: embedded_sentences = \
   ...: np.array([
   ...:    np.array([[1,2,3],[0,0,0]]),
   ...:    np.array([[10,20,30],[40,50,60]])
   ...: 
   ...: ])

In [3]: embedded_sentences.shape
Out[3]: (2, 2, 3)

I guess you meant the following:

I have a ~~numpy array~~ list of 2d-numpy arrays (an array of word-embeddings of sentences) e.g

In [4]: embedded_sentences = \
   ...: [
   ...:    np.array([[1,2,3],[0,0,0]]),
   ...:    np.array([[10,20,30],[40,50,60]])
   ...: ]

If yes, then np.array should be sufficient as shown below:

In [5]: embedded_sentences = np.array(embedded_sentences)

In [6]: embedded_sentences
Out[6]: 
array([[[ 1,  2,  3],
        [ 0,  0,  0]],

       [[10, 20, 30],
        [40, 50, 60]]])

In [7]: embedded_sentences.shape
Out[7]: (2, 2, 3)

However, in your post you mentioned that this is going to be an input in pytorch. Therefore, you can use pytorch directly as shown below:

In [8]: import torch

In [9]: t = torch.tensor(embedded_sentences)

In [10]: t
Out[10]: 
tensor([[[ 1,  2,  3],
         [ 0,  0,  0]],

        [[10, 20, 30],
         [40, 50, 60]]])

In [11]: t.shape
Out[11]: torch.Size([2, 2, 3])

In [12]: t.dtype
Out[12]: torch.int64