Normalizing with two neural networks - Tensorflow-CodePudding

I have two keras models. I concatenate the output layer into a single output. I then use this output in the second model. However, I am unclear about how to normalize my data.

At what point should normalization occur? I normalize before the first model. I also tried normalizing through tf.keras.layers.LayerNormalization(axis=0) and tf.keras.layers.BatchNormalization(axis=0). But when should these be added?

Any guidance or resources are much appreciated.

def phi(lat_dim, feature_normaliser, activation):
    model1 = keras.Sequential()
    model1.add(feature_normaliser)
    model1.add(layers.Dense(100,activation= activation))
    model1.add(layers.Dense(lat_dim))
    return model1


def rho(model1, learning_rate, activation):
    model2 = keras.Sequential()
    model2.add(model1)
    model2.add(Lambda(lambda x: K.sum(x, axis=0,keepdims=True)))
    #tf.keras.layers.BatchNormalization(axis=0)
    model2.add(layers.Dense(100,activation= activation))
    model2.add(layers.Dense(1))
    model2.add(BatchNormalization())
    model2.compile(
    optimizer=tf.optimizers.SGD(learning_rate=learning_rate),
    loss='mean_squared_error')
    return model2

Calling the model, results in nan:

feature_normaliser = layers.Normalization(input_shape=[10], axis=1,name='normaliser')
feature_normaliser.adapt(X_train)
    
phi_output = phi(5, feature_normaliser, 'relu')    
    
rho_output = rho(phi_output, 0.0001, 'relu')
    
history_rho, rho_trained = Model_fit(rho_output,X_train,Y_train,X_val,Y_val, 128, 10)
print(history_rho.history['loss'][-1])

CodePudding user response：

You can normalize anywhere.

But there are two important normalizations:

The input "data" should be normalized (usually outside the model)
The output "data" should be normalized (usually outside the model) and your final activation must be compatible with this normalization

BatchNormalization can be used almost anywhere, there is no correct answer. Just like building any model, using BatchNormalization is sorf of an art. You test, see if the results are good, change places, etc.

You can, for instance, not normalize the input data and put a BatchNormalization right after the input layer. It's a possibility. You can use the BatchNormalization before some activations to avoid vanishing gradients and relu locking.

A few BatchNormalziation layers in a model can make training a lot faster, but it's not "necessary".

A warining: if you use Dropout, don't use BatchNormalization right after. They're not compatible because dropout changes the data distribution (keeps mean, but changes deviation) and this change will make a difference between training and validation that will make the normalization work differently.