Irrelevant results of seq-to-seq LSTM-CodePudding

I am trying to predict a sequence of integers based on the input numbers.

The input consists of values with 10 integers:

array([[2021001001], [2021001002],..., 
,[2021335249]],dtype=int64)

The output is the following, an array containing 7 integers.

array([[23, 26, 17, ..., 21, 16,  4],
       [13, 24,  2, ..., 27, 10, 28],
       ...,
       [ 5, 16, 28, ..., 12, 27, 26]], dtype=int64)

This means that sequence number (input) [2021001001] will return the following sequence (output) [23, 26, 17, ..., 21, 16, 4].

I tried training an LSTM on these inputs and outputs to predict what the following sequence will be based on a sequence number. I'm using about 60K of historical data to do this. So far here's what I did:

model = tf.keras.Sequential()
model.add(layers.LSTM(256, activation='relu', input_shape=(10, 1), recurrent_dropout=0.2))
model.add(layers.Dense(7))
model.compile(optimizer=tf.keras.optimizers.Adam(0.00001), loss=tf.keras.losses.MeanSquaredError(), metrics=['accuracy'])

model.fit(inputs, output, epochs=10, verbose=1, validation_split=0.2, batch_size=256)

When testing the model after fitting we get weird results like the following:

predictNextNumber = model.predict(tests_[0], verbose=1)
print(predictNextNumber)

1/1 [==============================] - 0s 253ms/step
[[[14.475913][14.757163][14.874351][14.702476][14.639976][14.624351][14.655601]]]

While the expected output should be an array of integers [24, 12,  3,  5, 11,  8,  4].

I'm having trouble figuring out what the problem is. Keras complained a lot about the shapes at first but when it was handled I kept receiving bad results. Any help would be appreciated.

CodePudding user response：

The description of your problem is a bit vague. It would be useful to get some actual data s.t. we can try this on our own. It's also unclear what this data represents so we can't tell you if what you're doing even has a chance of success. It's not clear whether the x and predict the y.

However, it is very likely that the inputs and outputs are too big for your network. Networks (usually) work better with numbers in [-1, 1] so what you should probably do is use something like a StandardScaler. You don't have to install sklearn for this. You can just compute the mean and standard deviation of your data and scale everything according to

x_scaled = (x - m) / d

and

x = x_scaled * d   m

for the inverse operation given m is the mean and d the standard deviation of your data x.

Since your inputs and outputs appear to come from different distributions, you'd have to do this two times.

Assuming you use sklearn's StandardScaler, you'd do something like this:

x_scaler = StandardScaler().fit(x_train)
y_scaler = StandardScaler().fit(y_train)
scalers = dict(x=x_scaler, y=y_scaler)

# Use scaler.transform(x) 
train_data = get_dataset(scalers, mode="train")
valid_data = get_dataset(scalers, mode="dev")
test_data = get_dataset(scalers, mode="test")

model.fit(train_data, validation_data=valid_data)

# Look at some test data by using `scaler.inverse_tranfform(data)

df = pd.DataFrame([], columns=["target", "prediction"])
for x, y in test_data:
    y_pred = model(x)
    y_pred = y_scaler.inverse_transform(y_pred)
    data = np.concatenate([y, y_pred], axis=-1)
    df = pd.concat([df, pd.DataFrame(data, columns=["target", "prediction"])])

df.target = df.target.astype(int)
df.prediction = df.prediction.round(2)
print(df)

CodePudding user response：

The input numbers are very big, so add a normalization layer:

normalization_layer = tf.keras.layers.Normalization()
normalization_layer.adapt(inputs)

model = tf.keras.Sequential()
model.add(Input(shape=(10, 1)))
model.add(normalization_layer)
model.add(layers.LSTM(256, activation='relu', recurrent_dropout=0.2))
...

You might need to train for many more epochs.

The learning_rate of the optimizer seems a little bit low, maybe try the default values first.

Since you are predicting continous values, your metric should not be accuracy, but mse or mae or similar.