Data type issue while appending from a for loop-CodePudding

I am using this for loop to separate dataset into groups. but the list "y" is converting into an array with an error.

def to_sequences(dataset, seq_size=1):
x = []
y = []

for i in range(len(dataset)-seq_size):
   
    window = dataset[i:(i seq_size), 0]
    x.append(window)
    window2 = dataset[(i seq_size):i seq_size 5, 0]
    y.append(window2)
    
return np.array(x),np.array(y)

seq_size = 5 
trainX, trainY = to_sequences(train, seq_size)
print("Shape of training set: {}".format(trainX.shape))
print("Shape of training set: {}".format(trainY.shape))

And this is the error message I get

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. return np.array(x),np.array(y)

Couldn't find the issue why it is working for 'x' and not for 'y'. Any idea ?

CodePudding user response：

In [247]: dataset = np.arange(20)
In [248]: def to_sequences(dataset, seq_size=1):
     ...:     x = []
     ...:     y = []
     ...:     for i in range(len(dataset)-seq_size):
     ...:         window = dataset[i:(i seq_size), 0]
     ...:         x.append(window)
     ...:         window2 = dataset[(i seq_size):i seq_size 5, 0]
     ...:         y.append(window2)
     ...:     return np.array(x),np.array(y)
     ...:

and a sample run:

In [250]: to_sequences(dataset[:,None], 5)
<ipython-input-248-176eb762993c>:9: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return np.array(x),np.array(y)
Out[250]: 
(array([[ 0,  1,  2,  3,  4],
        [ 1,  2,  3,  4,  5],
        [ 2,  3,  4,  5,  6],
        [ 3,  4,  5,  6,  7],
        [ 4,  5,  6,  7,  8],
        [ 5,  6,  7,  8,  9],
        [ 6,  7,  8,  9, 10],
        [ 7,  8,  9, 10, 11],
        [ 8,  9, 10, 11, 12],
        [ 9, 10, 11, 12, 13],
        [10, 11, 12, 13, 14],
        [11, 12, 13, 14, 15],
        [12, 13, 14, 15, 16],
        [13, 14, 15, 16, 17],
        [14, 15, 16, 17, 18]]),
 array([array([5, 6, 7, 8, 9]), array([ 6,  7,  8,  9, 10]),
        array([ 7,  8,  9, 10, 11]), array([ 8,  9, 10, 11, 12]),
        array([ 9, 10, 11, 12, 13]), array([10, 11, 12, 13, 14]),
        array([11, 12, 13, 14, 15]), array([12, 13, 14, 15, 16]),
        array([13, 14, 15, 16, 17]), array([14, 15, 16, 17, 18]),
        array([15, 16, 17, 18, 19]), array([16, 17, 18, 19]),
        array([17, 18, 19]), array([18, 19]), array([19])], dtype=object))

The first array is (n,5) int dtype. The second is object dtype, containing arrays. Most of the arrays (5,), but the last ones are (4,),(3,),(2,),(1,).

dataset[(i seq_size):i seq_size 5, 0] is slicing off the end of dataset. Python/numpy allows that but the result is truncated.

You'll have to rethink that y slicing if you want a (n,5) shaped array.

Slicing off the end of a list:

In [252]: [1,2,3,4,5][1:4]
Out[252]: [2, 3, 4]
In [253]: [1,2,3,4,5][3:6]
Out[253]: [4, 5]