loss (I am using mean_absolute_percentage_error) is very high while training for simple regression u-CodePudding

House Prices - Advanced Regression Here is the code:

model = keras.models.Sequential([keras.layers.Flatten(input_shape=[76,1])])
for _ in range(20):
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Activation('selu'))
    model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(1))

model.compile(loss='mean_absolute_percentage_error', optimizer=keras.optimizers.SGD(learning_rate=1e-2, decay=2e-4))

early_stopping_cb = keras.callbacks.EarlyStopping(patience=10)
checkpoint_cb = keras.callbacks.ModelCheckpoint("house_prediction_model.h5", save_best_only=True)

history = model.fit(X_train, y_train, epochs=100,
                    validation_data=(X_cv, y_cv),
                    callbacks=[checkpoint_cb, early_stopping_cb])

I thought there is a problem of exploding gradients so I added BatchNormalization. I tried without it that there's no change.

output:

Epoch 1/100
30/30 [==============================] - 1s 24ms/step - loss: 100.0000 - val_loss: 100.0000
Epoch 2/100
30/30 [==============================] - 0s 15ms/step - loss: 99.9999 - val_loss: 100.0000
Epoch 3/100
30/30 [==============================] - 0s 14ms/step - loss: 100.0000 - val_loss: 100.0000
Epoch 4/100
30/30 [==============================] - 1s 19ms/step - loss: 99.9999 - val_loss: 100.0000
Epoch 5/100
30/30 [==============================] - 0s 15ms/step - loss: 99.9999 - val_loss: 100.0000
Epoch 6/100
30/30 [==============================] - 0s 14ms/step - loss: 100.0000 - val_loss: 100.0000
Epoch 7/100
30/30 [==============================] - 0s 12ms/step - loss: 100.0000 - val_loss: 100.0000
Epoch 8/100
30/30 [==============================] - 0s 15ms/step - loss: 99.9999 - val_loss: 100.0000
Epoch 9/100
30/30 [==============================] - 0s 14ms/step - loss: 99.9999 - val_loss: 100.0000
Epoch 10/100
30/30 [==============================] - 0s 15ms/step - loss: 99.9999 - val_loss: 100.0000

Please help me solve this.

CodePudding user response：

It is not usual (and afaik it is a bad idea) to apply either batchnorm or an activation function to the inputs as you are doing here, before they have gone through any dense layer.

Simple regression models should usually start with a Dense layer - suggest you see what happens if you cut the model down to Dense(n, relu) Dense(1) perhaps. You can add batchnorm between dense layers to see if that helps.

Your input data (both X and Y) should be standardized before feeding it into the model. Unstandardised data can intefere with optimization.

You might also think about whether mean_absolute_percentage_error is the right loss function - it usually isn't. (And certainly isn't if you have standardized Y as I suggest above). If your Y value is something like, say income or house prices, always positive and something for which you really would like to minimise mean_absolute_percentage_error, then you can't standardize it and then use mean_absolute_percentage_error.

One alternative is not to standardize Y, but that could (and I think probably will) still have convergence problems. If you want to try this then at least scale it so its standard devation is around 1.0

A better approach (with say houseprice) might be to use log(houseprice) , then standardize that before putting it into the model as your Y variable, and then use mean_absolute_error or mean_squared_error as a loss.

Also maybe start with a simpler model - say one or two hidden layers, before you try 20

CodePudding user response：

Please checkout these parts in your code:

input_shape: if your dataset contains 76 features, so you should define it as (76,) and so you don't need to define flatten layer, and change it into input layer with the input shape as i said above.
layers: it would be better to define your model with low layers (simple architecture) and then if it were not helpful change it into big model. so it would be better to change number of iterations.
selu: I don't know why did you use selu activation in your model, it's based on an approach or not?
output_layer: cause or range of values, if the outputs are in high range, it would be better to define lambda layer after the last layer you've defined above, and scale up values based on outputs range something like this (keras.layers.Lambda(lambda val : 100.0 * var))
optimizer: it would be better optimize the learning rate with a callback (Learning Rate Schedule) if you wanna use SGD as optimizer and based on that define lr. but if you don't wanna optimize learning rate, substitute with adam or rmsprop.
loss: cause of the vital role the loss function plays in neural networks, you should define function which changes continuously: mean squared error or mean absolute error and huber loss, for this type of problem i think it would be better to change it into huber loss.