I have two keras models. I concatenate the output layer into a single output. I then use this output in the second model. However, I am unclear about how to normalize my data.
At what point should normalization occur? I normalize before the first model. I also tried normalizing through tf.keras.layers.LayerNormalization(axis=0)
and tf.keras.layers.BatchNormalization(axis=0)
. But when should these be added?
Any guidance or resources are much appreciated.
def phi(lat_dim, feature_normaliser, activation):
model1 = keras.Sequential()
model1.add(feature_normaliser)
model1.add(layers.Dense(100,activation= activation))
model1.add(layers.Dense(lat_dim))
return model1
def rho(model1, learning_rate, activation):
model2 = keras.Sequential()
model2.add(model1)
model2.add(Lambda(lambda x: K.sum(x, axis=0,keepdims=True)))
#tf.keras.layers.BatchNormalization(axis=0)
model2.add(layers.Dense(100,activation= activation))
model2.add(layers.Dense(1))
model2.add(BatchNormalization())
model2.compile(
optimizer=tf.optimizers.SGD(learning_rate=learning_rate),
loss='mean_squared_error')
return model2
Calling the model, results in nan
:
feature_normaliser = layers.Normalization(input_shape=[10], axis=1,name='normaliser')
feature_normaliser.adapt(X_train)
phi_output = phi(5, feature_normaliser, 'relu')
rho_output = rho(phi_output, 0.0001, 'relu')
history_rho, rho_trained = Model_fit(rho_output,X_train,Y_train,X_val,Y_val, 128, 10)
print(history_rho.history['loss'][-1])
CodePudding user response:
You can normalize anywhere.
But there are two important normalizations:
- The input "data" should be normalized (usually outside the model)
- The output "data" should be normalized (usually outside the model) and your final activation must be compatible with this normalization
BatchNormalization
can be used almost anywhere, there is no correct answer. Just like building any model, using BatchNormalization
is sorf of an art. You test, see if the results are good, change places, etc.
You can, for instance, not normalize the input data and put a BatchNormalization
right after the input layer. It's a possibility. You can use the BatchNormalization
before some activations to avoid vanishing gradients and relu locking.
A few BatchNormalziation
layers in a model can make training a lot faster, but it's not "necessary".
A warining: if you use Dropout
, don't use BatchNormalization
right after. They're not compatible because dropout changes the data distribution (keeps mean, but changes deviation) and this change will make a difference between training and validation that will make the normalization work differently.