Building a quick GRU model for stock prediction-CodePudding

I am beginner in RNNs and would like to build a running model gated recurrent unit GRU for stock prediction.

I have a numpy array for the training data with this shape:

train_x.shape
(1122,20,320)

`1122` represents the total amount timestamps I have
`20` is the amount of timestamps I want to predict the future from
`320` is the number of features (different stocks)

My train_y.shape is (1122,) and represents a binary variable with 1 and 0. 1 is a buy 0 is a sell.

With that in my mind I started to attempt my GRU model as:

 def GRU_model(train_x,train_y,test_x,test_y):

    model = Sequential()
    model.add(layers.Embedding(train_x.shape[0],50,input_length=320))
    model.add(layers.GRU(50, return_sequences=True,input_shape=(train_x.shape[1],1),activation='tanh'))
    model.add(layers.GRU(50, return_sequences=True,input_shape=(train_x.shape[1],1),activation='tanh'))
    model.add(layers.GRU(50, return_sequences=True,input_shape=(train_x.shape[1],1),activation='tanh'))
    model.add(layers.GRU(50,activation='tanh'))
    model.add(Dense(units=2))
    model.compile(optimizer=SGD(lr=0.01,decay=1e-7,momentum=0.9,nesterov=False),loss='mean_squared_error')
    
    model.fit(train_x,train_y,epochs=EPOCHS,batch_size=BATCH_SIZE)

    GRU_predict = model.predict(validation_x)

    return model,GRU_predict



my_gru_model,my_gru_predict = GRU_model(train_x,train_y,validation_x,validation_y)
ValueError: Input 0 of layer gru_42 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 20, 320, 50)

Clearly my input dimensions into the model are incorrect, but I do not understand how they should fit in, so the model can run smoothly.

CodePudding user response：

So if you have 1122 data samples and each sample has 20 time steps and each time step has 320 features and you want to teach your model to make a binary decision between buying and selling, try something like this:

import tensorflow as tf
tf.random.set_seed(1)

model = tf.keras.Sequential()
model.add(tf.keras.layers.GRU(50, return_sequences=True, input_shape=(20, 320), activation='tanh'))
model.add(tf.keras.layers.GRU(50,activation='tanh'))
model.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.01,decay=1e-7,momentum=0.9,nesterov=False),loss='binary_crossentropy')
print(model.summary())

train_x = tf.random.normal((1122, 20, 320))
train_y = tf.random.uniform((1122,), maxval=2, dtype=tf.int32)
model.fit(train_x, train_y, epochs=5, batch_size=16)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 gru (GRU)                   (None, 20, 50)            55800     
                                                                 
 gru_1 (GRU)                 (None, 50)                15300     
                                                                 
 dense (Dense)               (None, 1)                 51        
                                                                 
=================================================================
Total params: 71,151
Trainable params: 71,151
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
71/71 [==============================] - 5s 21ms/step - loss: 0.7050
Epoch 2/5
71/71 [==============================] - 2s 22ms/step - loss: 0.6473
Epoch 3/5
71/71 [==============================] - 1s 21ms/step - loss: 0.5513
Epoch 4/5
71/71 [==============================] - 1s 21ms/step - loss: 0.3640
Epoch 5/5
71/71 [==============================] - 1s 20ms/step - loss: 0.1258
<keras.callbacks.History at 0x7f4eac87e610>

Note that you have a single output node because your model is supposed to make a binary decision. This is also the reason why you have to use the loss function binary_crossentropy.

Regarding the GRU layer, it expects an input with the shape (batch_size, timesteps, features), but the batch_size is inferred during training and is therefore omitted in the input_shape. Since the next GRU also requires this shape, you use the parameter return_sequences=True in the first GRU, which returns a sequence with the shape (batch_size, 20, 50) => one hidden state output 50 for each input time step n. Also you do not need an Embedding layer in your case. It is usually used to map integer sequences representing text into n-dimensional vector representations.