I have been trying to save a movie recommendation model from github to then serve using tf-serving. The code below will first create a list of taggs from my corpus and then provide me vectors based on those lists
mv_tags_doc = [TaggedDocument(words=(D), tags=[str(i)]) for i, D in enumerate(mv_tags_corpus)]
max_epochs = 50
vec_size = 20
alpha = 0.025
model = Doc2Vec(alpha=alpha,
min_alpha=0.00025,
min_count=1,
dm=0) # paragraph vector distributed bag-of-words (PV-DBOW)
model.build_vocab(mv_tags_doc)
print('Epoch', end = ': ')
for epoch in range(max_epochs):
print(epoch, end = ' ')
model.train(mv_tags_doc,
total_examples=model.corpus_count,
epochs=model.epochs)
# decrease the learning rate
model.alpha -= 0.0002
# fix the learning rate, no decay
model.min_alpha = model.alpha
When I try saving it using the documentation available here
import tempfile
MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
print('export_path = {}\n'.format(export_path))
tf.keras.models.save_model(
model,
export_path,
overwrite=True,
include_optimizer=True,
save_format=None,
signatures=None,
options=None
)
print('\nSaved model:')
!ls -l {export_path}
I get this error
AttributeError Traceback (most recent call last)
/tmp/ipykernel_563154/3914941631.py in <module>
6 print('export_path = {}\n'.format(export_path))
7
----> 8 tf.keras.models.save_model(
9 model,
10 export_path,
~/anaconda3/lib/python3.9/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
~/anaconda3/lib/python3.9/site-packages/keras/saving/saving_utils.py in try_build_compiled_arguments(model)
319 def try_build_compiled_arguments(model):
320 if (not version_utils.is_v1_layer_or_model(model) and
--> 321 model.outputs is not None):
322 try:
323 if not model.compiled_loss.built:
AttributeError: 'Doc2Vec' object has no attribute 'outputs'
CodePudding user response:
I wouldn't expect the tf.keras.models.suave_model()
function – which sounds from its naming to be specific to TensorFlow & Keras – to work on a Gensim Doc2Vec
model, which is not part of, or related to, or built upon either TensorFlow or Keras.
Looking at the docs for save_model()
, I see its declared functionality is:
Saves a model as a TensorFlow SavedModel or HDF5 file.
Neither "TensorFlow SavedModel" nor "HDF5 file" should be expected as sufficient formats to save another project's custom model (in this case a Gensim Doc2Vec
object), unless it specifically claimed that as a capability. So some sort of failure or error here is expected behavior.
If you real goal is to simply be able to re-load the model later, don't involve TensorFlow/Keras at all. You could either:
- use Python's internal
pickle
mechanism, or - use the
.save(fname)
method native-to model classes in the Gensim package, which uses its ownpickel
-and-numpy
-based save format. For example:
filename = 'my_doc2vec_model'
initial_model.save(filename)
Note that such saves may be spread over several related files alongside each other, all starting with the same string you provided, which should be kept together. (That is, after the code above, be sure to keep any and all files that begin with the string 'my_doc2vec_model'
together.)
You'd then re-load by calling .load()
on the expected model class:
reloaded_model = Doc2Vec.load(filename)
Separately: your Doc2Vec code shows a number of bad practices. Using such a low min_count=1
is almost always a bad idea, slowing training & worsening results, with this sort of algorithm. And decrementing the alpha
yourself, in your own loop calling .train()
multiple times, is unnecessarily complex & error-prone. Whatever template/tutorial that suggested that approach is probably a bad one.