Keras save_model() and load_model() methods not working as expected-CodePudding

Consider the following code:

from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

def build_model():
    model = Sequential()
    model.add(Dense(units=64, kernel_initializer='uniform', activation='relu'))
    model.add(Dense(units=128, kernel_initializer='uniform', activation='relu'))
    model.add(Dense(units=64, kernel_initializer='uniform', activation='relu'))
    model.add(Dense(units=1, kernel_initializer='uniform', activation='relu'))
    optimizer = Adam(learning_rate=0.001)
    model.compile(loss='mean_squared_error', optimizer=optimizer)
    return model

model1 = build_model()
model1.fit(X_train, Y_train, epochs=10, batch_size=64, verbose=0) # Initial fitting.

keras.models.save_model(model1, f"models/initial") # Saving the model before further fitting.
model1.fit(X_train, Y_train, epochs=15, batch_size=64, verbose=0)
predictions1 = model1.predict(X_test)

model2 = keras.models.load_model(f"models/initial") # Loading the model after the initial fitting.
model2.fit(X_train, Y_train, epochs=15, batch_size=64, verbose=0)
predictions2 = model2.predict(X_test)

print("--------------------------------")
print(predictions1)
print("--------------------------------")
print(predictions2)
print("--------------------------------")

I build a model, which I initially fit. Then I save the model. After that, the model is further fit for 15 epochs and predictions are made. The final model is not saved. After that, the saved model (the fit after the first 10 epochs) is loaded and the same process as above (further fit and predictions) is performed. The results of the two predictions are supposed to be the same since the same model is supposedly used, however, they are not. Any idea of what may be the cause of this? Here are the results I get:

--------------------------------
[[2350.2917]
 [2369.7139]
 [2367.1833]
 [2373.8337]
 [2369.4788]
 [2373.716 ]
 [2372.0095]
 [2374.5989]
 [2374.658 ]]
--------------------------------
[[2532.1902]
 [2571.7231]
 [2566.572 ]
 [2580.1086]
 [2571.2444]
 [2579.8694]
 [2576.395 ]
 [2581.6663]
 [2581.786 ]]
--------------------------------

Process finished with exit code 0

CodePudding user response：

Your rationale is right, however I think (one of the reasons for) the different results here are due to the stochastic manner of training.

Supposing that we both have the same network, we initialize it in the same manner, we use Xavier initialization.

Then do we get the same results, just because we initialize in the same way? Not really, it is mathematically impossible to arrive exactly at the same results, since the training of the networks is a non-deterministic process.

Back to your example. If I save the model at step X, having weights Y, if I continue to train, and then load the same model, with the same weights Y, and train it, model_1 (not loaded) could arrive at set of weights Z1 and model_2 (loaded) could arrive at set of weights Z2.

You can also check this link here: Building Neural Network over and over with same parameters give different result

The only way to ensure reproducibility(as far as I am aware of) is to "seed everything", like below:

random.seed(seed)
np.random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
tf.random.set_seed(seed)

According to this thread on GitHub https://github.com/keras-team/keras/issues/14986,

In TensorFlow >= 2.5, one can use TF_DETERMINISTIC_OPS=1 is also a solution to ensure the reproducibility.

CodePudding user response：

Two things:

when training a model the initial weights are random i.e. the starting weights will be different (see here for initialising the weights: https://keras.io/api/layers/initializers/
you said you're training for 15 epochs. Why are you fitting 3x times with 5 epochs? Why not one time with 15 epochs?