I am using this for loop to separate dataset into groups. but the list "y" is converting into an array with an error.
def to_sequences(dataset, seq_size=1):
x = []
y = []
for i in range(len(dataset)-seq_size):
window = dataset[i:(i seq_size), 0]
x.append(window)
window2 = dataset[(i seq_size):i seq_size 5, 0]
y.append(window2)
return np.array(x),np.array(y)
seq_size = 5
trainX, trainY = to_sequences(train, seq_size)
print("Shape of training set: {}".format(trainX.shape))
print("Shape of training set: {}".format(trainY.shape))
And this is the error message I get
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. return np.array(x),np.array(y)
Couldn't find the issue why it is working for 'x' and not for 'y'. Any idea ?
CodePudding user response:
In [247]: dataset = np.arange(20)
In [248]: def to_sequences(dataset, seq_size=1):
...: x = []
...: y = []
...: for i in range(len(dataset)-seq_size):
...: window = dataset[i:(i seq_size), 0]
...: x.append(window)
...: window2 = dataset[(i seq_size):i seq_size 5, 0]
...: y.append(window2)
...: return np.array(x),np.array(y)
...:
and a sample run:
In [250]: to_sequences(dataset[:,None], 5)
<ipython-input-248-176eb762993c>:9: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return np.array(x),np.array(y)
Out[250]:
(array([[ 0, 1, 2, 3, 4],
[ 1, 2, 3, 4, 5],
[ 2, 3, 4, 5, 6],
[ 3, 4, 5, 6, 7],
[ 4, 5, 6, 7, 8],
[ 5, 6, 7, 8, 9],
[ 6, 7, 8, 9, 10],
[ 7, 8, 9, 10, 11],
[ 8, 9, 10, 11, 12],
[ 9, 10, 11, 12, 13],
[10, 11, 12, 13, 14],
[11, 12, 13, 14, 15],
[12, 13, 14, 15, 16],
[13, 14, 15, 16, 17],
[14, 15, 16, 17, 18]]),
array([array([5, 6, 7, 8, 9]), array([ 6, 7, 8, 9, 10]),
array([ 7, 8, 9, 10, 11]), array([ 8, 9, 10, 11, 12]),
array([ 9, 10, 11, 12, 13]), array([10, 11, 12, 13, 14]),
array([11, 12, 13, 14, 15]), array([12, 13, 14, 15, 16]),
array([13, 14, 15, 16, 17]), array([14, 15, 16, 17, 18]),
array([15, 16, 17, 18, 19]), array([16, 17, 18, 19]),
array([17, 18, 19]), array([18, 19]), array([19])], dtype=object))
The first array is (n,5) int dtype. The second is object dtype, containing arrays. Most of the arrays (5,), but the last ones are (4,),(3,),(2,),(1,).
dataset[(i seq_size):i seq_size 5, 0]
is slicing off the end of dataset
. Python/numpy allows that but the result is truncated.
You'll have to rethink that y
slicing if you want a (n,5) shaped array.
Slicing off the end of a list:
In [252]: [1,2,3,4,5][1:4]
Out[252]: [2, 3, 4]
In [253]: [1,2,3,4,5][3:6]
Out[253]: [4, 5]