Home > Software design >  seeking yet another numpy stacking function
seeking yet another numpy stacking function

Time:12-26

Just this:

>>> a1
array([[0],
       [1],
       [2],
       [3],
       [4]])
>>> b2                                                                    
array([[100, 101],
       [102, 103],
       [104, 105],
       [106, 107],
       [108, 109]])

I want to stack them side by side in a way that results in:

array([[[0], [100, 101]],
       [[1], [102, 103]],
       [[2], [104, 105]],
       [[3], [106, 107]],
       [[4], [108, 109]]])

I already figured out that hstack flattens the individual elements [0, 100, 101], and dstack requires the arrays to have the same shape.

But "there's always a way in numpy", I just haven't found it.

CodePudding user response:

Unfortunately, there is no proper way of creating these "ragged" tensors in NumPy, even though they are used quite regularly for tasks like data generators for deep learning models. I am assuming you are using it for such a task and understand the limitations of ndarrays as @hpaulj mentioned before.

As mentioned before, the core issue here is that the dimensions you need are not what Numpy expects for its ndarray objects. Each axis must have a uniform and a consistent number of elements, while in your example, one of the axis has 2 different lengths for its elements.

enter image description here


All is not lost, however. There are a few ways of handling this -

Using NumPy to store array objects

It's a crude way of doing this, but it works when you have features that need an internal list/tuple/array structure for each value.

np.array(list(zip(a1, b2)))
array([[array([0]), array([100, 101])],
       [array([1]), array([102, 103])],
       [array([2]), array([104, 105])],
       [array([3]), array([106, 107])],
       [array([4]), array([108, 109])]], dtype=object)

Using ragged tensors from tensorflow

There are many ways to create ragged tensors, but I will just show the conversion of the previous tensor to ragged tensors.

tf.ragged.constant(np.array(list(zip(a1, b2))))

<tf.RaggedTensor [[[0], [100, 101]], 
                  [[1], [102, 103]], 
                  [[2], [104, 105]], 
                  [[3], [106, 107]], 
                  [[4], [108, 109]]]>

The advantage here is that you get the "full" flexibility of using tensor operations allowed on ragged tensors in TensorFlow, including passing them in batching data generators for your models.

CodePudding user response:

Look at what happens when I try to make an array from your desired result:

In [8]: arr = np.array([[[0], [100, 101]],
   ...:        [[1], [102, 103]],
   ...:        [[2], [104, 105]],
   ...:        [[3], [106, 107]],
   ...:        [[4], [108, 109]]])
C:\Users\paul\AppData\Local\Temp\ipykernel_6620\2695759424.py:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  arr = np.array([[[0], [100, 101]],

In [9]: arr
Out[9]: 
array([[list([0]), list([100, 101])],
       [list([1]), list([102, 103])],
       [list([2]), list([104, 105])],
       [list([3]), list([106, 107])],
       [list([4]), list([108, 109])]], dtype=object)

That's a (5,2) shape, object dtype. Is that really what you want?

Here's a way to make such an array of lists:

In [22]: x=np.empty((5,2),object)
In [23]: x[:,0]=a1.tolist()
In [24]: x[:,1]=b2.tolist()

In [25]: x
Out[25]: 
array([[list([0]), list([100, 101])],
       [list([1]), list([102, 103])],
       [list([2]), list([104, 105])],
       [list([3]), list([106, 107])],
       [list([4]), list([108, 109])]], dtype=object)
  • Related