Unable to load tensorflow model with pickle-CodePudding

I am trying to use pickle for tensorflow models serialization. Here is the code (dump.py) to save the model in a pickle file:

import tensorflow as tf
import pickle
import numpy as np

tf.random.set_seed(42)

input_x = np.random.randint(0, 50000, (10000,1))
input_y = np.random.randint(0, 50000, (10000,1))
output = input_x   input_y
input = np.concatenate((input_x, input_y), axis=1)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(2, activation = tf.keras.activations.relu, input_shape=[2]),   
    tf.keras.layers.Dense(2, activation = tf.keras.activations.relu),
    tf.keras.layers.Dense(1),
])

model.compile(loss = tf.keras.losses.mae,
              optimizer=tf.optimizers.Adam(learning_rate=0.00001),
              metrics = ['mse'])
          
model.fit(input, output, epochs = 1000)

fl = open('D:/tf/tf.pkl', 'wb')
pickle.dump(model, fl)
fl.close()

Here is the code (load.py) to load the model from the pickle file:

import pickle

fl = open('D:/tf/tf.pkl', 'rb')
model = pickle.load(fl)
print(model.predict([[2.2, 5.1]]))
fl.close()

This works fine under Linux. When called from Windows, dump.py succeeds, however load.py fails with the following error message:

2022-08-09 19:48:30.078245: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-08-09 19:48:30.078475: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-08-09 19:48:32.847626: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-08-09 19:48:32.847804: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-08-09 19:48:32.851014: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DEVELOPER
2022-08-09 19:48:32.851211: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DEVELOPER
2022-08-09 19:48:32.851607: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "D:\tf\create_model.py", line 29, in <module>
    model = pickle.load(fl)
  File "C:\Users\developer\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\saving\pickle_utils.py", line 48, in deserialize_model_from_bytecode
    model = save_module.load_model(temp_dir)
  File "C:\Users\developer\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\developer\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\saved_model\load.py", line 977, in load_internal
    raise FileNotFoundError(
FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ram://5488f35a-e52b-472b-b9d6-110c8b5a3aaf/variables/variables
 You may be trying to load on a different device from the computational device. Consider setting the `experimental_io_device` option in `tf.saved_model.LoadOptions` to the io_device such as '/job:localhost'.

How can I fix this?

CodePudding user response：

As the error says, this problem can occur when "you may be trying to load on a different device from the computational device".

The error does not come directly from pickle, but rather from Tensorflow itself, as you can see in the stack trace, when it tries to execute this line:

model = save_module.load_model(temp_dir)

So pickle just tries to load the model with Tensorflow's SavedModel under the hood. The solution to this would be to add the save options as suggested. However you would have to add them to the load_model call, and I think this cannot be done from a pickle.load().

If you don't have a particular reason to use pickle instead of calling directly the Tensorflow's utilities, I suggest you to switch to save and load_model directly.

In this case this is the code you could use:

# save
save_option = tf.saved_model.SaveOptions(experimental_io_device="/job:localhost")
model.save(model_dir, options=save_option)

# load
loaded_model = tf.keras.models.load_model(model_dir, options=save_option)

Otherwise if you really want to keep pickle I guess you could only pickle the weights of your model. The instruction weights = model.get_weights() returns the list of all weight tensors as Numpy arrays. You can convert the weights to an array and pickle it. On the other device then you could re-create your architecture and re-load the weights there. To load the weights simply do model.set_weights(weights). See here to learn more.

The major drawback here is having to copy the code of the architecture on the destination device because you have to re-create the structure of your model on your own, but if you're okay with that, this should definitely work.