I have a numpy array of 2d-numpy arrays (an array of word-embeddings of sentences) e.g
embedded_sentences =
np.array([
np.array([[1,2,3],[0,0,0]]),
np.array([[10,20,30],[40,50,60]])
])
I need to convert this to a 3d-array to this
np.array([
[[1,2,3],[0,0,0]],
[[10,20,30],[40,50,60]]
])
i.e from shape(2,)
(two sentences) to shape(2,2,3)
(two sentences with two words where each word is embedded in 3 dimensions)
such that I can use pytorch
to convert it using torch.from_numpy(embedded_sentences)
.
I have tried np.vstack(embedded_sentences)
and np.dstack(embedded_sentences)
but that does not seem to do the trick.
EDIT
I can do it this way
embedded_sentences = np.dstack(embedded_sentences)
embedded_sentences = np.transpose(embedded_sentences,(2,0,1))
which works, but aint pretty.
CodePudding user response:
A copy-n-paste of your code produces a 3d array right away.
But if we go the extra step of making sure it's a 1d object dtype array:
In [16]: x = np.empty(2,object); x[:] = list(embedded_sentences)
In [17]: x
Out[17]:
array([array([[1, 2, 3],
[0, 0, 0]]), array([[10, 20, 30],
[40, 50, 60]])], dtype=object)
That's effectively a list of 2 arrays. We can join those in the same way as np.array
does using np.stack
:
In [18]: np.stack(x)
Out[18]:
array([[[ 1, 2, 3],
[ 0, 0, 0]],
[[10, 20, 30],
[40, 50, 60]]])
I probably should have insisted on seeing what's wrong with the alternatives you tried, such as vstack
and dstack
:
In [19]: np.vstack(x)
Out[19]:
array([[ 1, 2, 3],
[ 0, 0, 0],
[10, 20, 30],
[40, 50, 60]])
In [20]: np.dstack(x)
Out[20]:
array([[[ 1, 10],
[ 2, 20],
[ 3, 30]],
[[ 0, 40],
[ 0, 50],
[ 0, 60]]])
As you found the dstack
can be transposed. The vstack
can be reshaped.
All these stack
functions tweak some dimensions and do a np.concatenate
.
CodePudding user response:
Your input already has desired shape and elements. Please see below code snippet:
In [1]: import numpy as np
In [2]: embedded_sentences = \
...: np.array([
...: np.array([[1,2,3],[0,0,0]]),
...: np.array([[10,20,30],[40,50,60]])
...:
...: ])
In [3]: embedded_sentences.shape
Out[3]: (2, 2, 3)
I guess you meant the following:
I have a numpy array list of 2d-numpy arrays (an array of word-embeddings of sentences) e.g
In [4]: embedded_sentences = \
...: [
...: np.array([[1,2,3],[0,0,0]]),
...: np.array([[10,20,30],[40,50,60]])
...: ]
If yes, then np.array
should be sufficient as shown below:
In [5]: embedded_sentences = np.array(embedded_sentences)
In [6]: embedded_sentences
Out[6]:
array([[[ 1, 2, 3],
[ 0, 0, 0]],
[[10, 20, 30],
[40, 50, 60]]])
In [7]: embedded_sentences.shape
Out[7]: (2, 2, 3)
However, in your post you mentioned that this is going to be an input in pytorch
. Therefore, you can use pytorch
directly as shown below:
In [8]: import torch
In [9]: t = torch.tensor(embedded_sentences)
In [10]: t
Out[10]:
tensor([[[ 1, 2, 3],
[ 0, 0, 0]],
[[10, 20, 30],
[40, 50, 60]]])
In [11]: t.shape
Out[11]: torch.Size([2, 2, 3])
In [12]: t.dtype
Out[12]: torch.int64