I have a model based on TableNet and VGG19, the data (Marmoot) for training and the saving path is mapped to a datalake storage (using Azure).
I'm trying to save it in the following ways and get the following errors on Databricks:
First approach:
import pickle pickle.dump(model, open(filepath, 'wb'))
This saves the model and gives the following output:
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 31). These functions will not be directly callable after loading.
Now when I try to reload the mode using:
loaded_model = pickle.load(open(filepath, 'rb'))
I get the following error (Databricks show in addition to the following error the entire stderr and stdout but this is the gist):
ValueError: Unable to restore custom object of type _tf_keras_metric. Please make sure that any custom layers are included in the `custom_objects` arg when calling `load_model()` and make sure that all layers implement `get_config` and `from_config`.
Second approach:
model.save(filepath)
and for the I get the following error:
Fatal error: The Python kernel is unresponsive. The Python process exited with exit code 139 (SIGSEGV: Segmentation fault). The last 10 KB of the process's stderr and stdout can be found below. See driver logs for full logs. --------------------------------------------------------------------------- Last messages on stderr: Mon Jan 9 08:04:31 2023 Connection to spark from PID 1285 Mon Jan 9 08:04:31 2023 Initialized gateway on port 36597 Mon Jan 9 08:04:31 2023 Connected to spark. 2023-01-09 08:05:53.221618: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
and much more, its hard to find the proper place of error form all of the stderr and stdout. It shows the entire stderr and stdout which makes it very hard to find the solution (it shows all the stderr and stdout including the training and everything)
Third approach (partially):
I also tried:
model.save_weights(weights_path)
but once again I was unable to reload them (this approach was tried the least)
Also I tried saving the checkpoints by adding this:
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath, monitor = "val_table_mask_loss", verbose = 1, save_weights_only=True)
as a callback in the fit
method (callbacks=[model_checkpoint]
)
but in the end of the first epoch it will generate the following error(I show the end of the Traceback):
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5f.pyx in h5py.h5f.create()
OSError: Unable to create file (file signature not found)
When I use the second approach on a platform that is not Databricks it works fine, but then when I try to load the model I get an error similar to the first approach loading.
Update 1
my variable filepath
that I try to save to is a dbfs
reference, and my dbfs
is mapped to the datalake storage
Update 2
When trying as suggested in the comments, with the following answer I get the following error:
----> 3 model2 = keras.models.load_model("/tmp/model-full2.h5")
...
ValueError: Unknown layer: table_mask. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.
Update 3:
So I try following the error plus this answer:
model2 = keras.models.load_model("/tmp/model-full2.h5", custom_objects={'table_mask': table_mask})
but then I get the following error:
TypeError: 'KerasTensor' object is not callable
CodePudding user response:
Try making the following changes to your custom object(s), so they can be properly serialized and deserialized:
Add the keywords arguments to your constructor:
def __init__(self, **kwargs):
super(TableMask, self).__init__(**kwargs)
Rename table_mask
to TableMask
to avoid naming conflicts. So when you load your model, it will look something like this:
model = keras.models.load_model("/tmp/path", custom_objects={'TableMask': TableMask, 'CustomObj2': CustomObj2, 'CustomMetric': CustomMetric})
Update from question author:
We found few error in my code:
- I had 2 custom layers with the same name as a variable (beginners mistake)
- I needed to add the custom objects to the load method in the custom_objects keyword as the answer suggested
- I also needed to change the
__init__
function as the answer suggest - I had a custom scoring class that I also needed to add to the custom_objects
Also I used the following answer that @AloneTogether suggested in the comments (this answer is the way I choose to save and load the model, plus the extra data we wrote in the above list)
After all that, the saving, loading, predicting worked great