I am using this architecture (a Masking Layer for varying trajectory lengths that are padded with 0s to maximum length trajectory followed by a LSTM with a dense layer afterwards that outputs 2 values) to build a regressor that predicts 2 values based on a trajectory.
samples, timesteps, features = x_train.shape[0], x_train.shape[1], x_train.shape[2]
model = Sequential()
model.add(tf.keras.layers.Masking(mask_value=0., input_shape=(timesteps, features), name="mask"))
model.add(LSTM(30, return_sequences=True, name="lstm1"))
model.add(LSTM(30, return_sequences=False, name="lstm2"))
model.add(Dense(20, activation='relu', name="dense1"))
model.add(Dense(20, activation='relu', name="dense2"))
model.add(Dense(2, activation='linear', name="output"))
model.compile(optimizer="adam", loss="mse")
training with:
model.fit(x_train, y_train, epochs = 10, batch_size = 32)
My input data are of shape:
x_train (269, 527, 11) (269 trajectories of 527 timesteps of 11 features)
y_train (269, 2) (these 269 trajectories have 2 target values)
x_test (30, 527, 11) (--- same ---)
y_test (30, 2) (--- same ---)
I've preprocessed my data so as all my sequences have fixed length and smaller ones are filled with 0's at missing timesteps. Thus, I'm using a Masking layer to skip these timesteps as they provide no information.
As expected, the output is of shape:
(30, 2)
But looking into it seems like its regressing the same values.
[[37.48257 0.7025466 ]
[37.48258 0.70254654]
[37.48257 0.70254654]
[37.48257 0.7025466 ]
[37.48258 0.70254654]
[37.48258 0.70254654]
[37.48258 0.70254654]
[37.48258 0.7025465 ]
[42.243515 0.6581909 ]
[37.48258 0.70254654]
[37.48257 0.70254654]
[37.48258 0.70254654]
[37.48261 0.7025462 ]
[37.48257 0.7025466 ]
[37.482582 0.70254654]
[37.482567 0.70254654]
[37.48257 0.7025466 ]
[37.48258 0.70254654]
[37.48258 0.70254654]
[37.48257 0.7025466 ]
[37.48258 0.70254654]
[37.48258 0.70254654]
[37.48258 0.70254654]
[37.482567 0.7025465 ]
[37.48261 0.7025462 ]
[37.482574 0.7025466 ]
[37.48261 0.7025462 ]
[37.48261 0.70254624]
[37.48258 0.70254654]
[37.48261 0.7025462 ]]
while my target values (y_test) is this:
[[70. 0.6]
[40. 0.6]
[ 6. 0.6]
[94. 0.7]
[50. 0.6]
[60. 0.6]
[16. 0.6]
[76. 0.9]
[92. 0.6]
[32. 0.8]
[22. 0.7]
[70. 0.7]
[36. 1. ]
[64. 0.7]
[ 0. 0.9]
[82. 0.9]
[38. 0.6]
[54. 0.8]
[28. 0.8]
[62. 0.7]
[12. 0.6]
[72. 0.8]
[66. 0.8]
[ 2. 1. ]
[98. 1. ]
[20. 0.8]
[82. 1. ]
[38. 1. ]
[68. 0.6]
[62. 1. ]]
It's like approaching the whole dataset as 1 data point. Can anybody with some experience find any obvious mistake here?
Appreciate any kind of help!
CodePudding user response:
When weights are random, they contribute to concrete inpute calculations chaotically, and we always get nearly same output. Did you train the model? Looks like not, consider simple MNIST solver output before training:
[-2.39 -2.54 -2.23 -2.24 -2.29 -2.37 -2.39 -2.10 -2.34 -2.20]
[-2.28 -2.43 -2.25 -2.33 -2.28 -2.42 -2.26 -2.19 -2.37 -2.25]
[-2.43 -2.44 -2.25 -2.33 -2.33 -2.37 -2.30 -2.10 -2.37 -2.17]
[-2.33 -2.43 -2.28 -2.27 -2.34 -2.34 -2.28 -2.16 -2.37 -2.26]
and after:
[-31.72 -31.65 -25.43 -20.04 -29.68 -0.00 -22.74 -25.88 -16.28 -13.30] (5)
[-12.44 -29.92 -21.19 -25.86 -22.53 -12.01 -0.00 -22.61 -18.88 -23.54] (6)
[-23.86 -25.77 -11.88 -9.18 -19.51 -20.85 -28.71 -0.00 -22.11 -14.57] (7)
[-33.67 -23.45 -17.82 -0.00 -28.89 -14.20 -32.54 -14.45 -11.13 -15.40] (3)
UPD: So training is presented, but not accomplishes its target. Well, a lot of things can be a reason. Besides of technical issues, the task may be to complex for neural network, for example, if target function can't be learned with gradual improvement.
Check datapathes, try to simplify the task, find some example solution, that solves close problem, examine and rework it.