'Doc2Vec' object has no attribute 'outputs', while saving doc2vec for tensorflow-CodePudding

I have been trying to save a movie recommendation model from github to then serve using tf-serving. The code below will first create a list of taggs from my corpus and then provide me vectors based on those lists

mv_tags_doc = [TaggedDocument(words=(D), tags=[str(i)]) for i, D in enumerate(mv_tags_corpus)]

max_epochs = 50
vec_size = 20
alpha = 0.025

model = Doc2Vec(alpha=alpha, 
               min_alpha=0.00025,
               min_count=1,
               dm=0) # paragraph vector distributed bag-of-words (PV-DBOW)
 
model.build_vocab(mv_tags_doc)

print('Epoch', end = ': ')
for epoch in range(max_epochs):
 print(epoch, end = ' ')
 model.train(mv_tags_doc,
             total_examples=model.corpus_count,
             epochs=model.epochs)
 # decrease the learning rate
 model.alpha -= 0.0002
 # fix the learning rate, no decay
 model.min_alpha = model.alpha

When I try saving it using the documentation available here

import tempfile

MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
print('export_path = {}\n'.format(export_path))

tf.keras.models.save_model(
    model,
    export_path,
    overwrite=True,
    include_optimizer=True,
    save_format=None,
    signatures=None,
    options=None
)

print('\nSaved model:')
!ls -l {export_path}

I get this error

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_563154/3914941631.py in <module>
      6 print('export_path = {}\n'.format(export_path))
      7 
----> 8 tf.keras.models.save_model(
      9     model,
     10     export_path,

~/anaconda3/lib/python3.9/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

~/anaconda3/lib/python3.9/site-packages/keras/saving/saving_utils.py in try_build_compiled_arguments(model)
    319 def try_build_compiled_arguments(model):
    320   if (not version_utils.is_v1_layer_or_model(model) and
--> 321       model.outputs is not None):
    322     try:
    323       if not model.compiled_loss.built:

AttributeError: 'Doc2Vec' object has no attribute 'outputs'

CodePudding user response：

I wouldn't expect the tf.keras.models.suave_model() function – which sounds from its naming to be specific to TensorFlow & Keras – to work on a Gensim Doc2Vec model, which is not part of, or related to, or built upon either TensorFlow or Keras.

Looking at the docs for save_model(), I see its declared functionality is:

Saves a model as a TensorFlow SavedModel or HDF5 file.

Neither "TensorFlow SavedModel" nor "HDF5 file" should be expected as sufficient formats to save another project's custom model (in this case a Gensim Doc2Vec object), unless it specifically claimed that as a capability. So some sort of failure or error here is expected behavior.

If you real goal is to simply be able to re-load the model later, don't involve TensorFlow/Keras at all. You could either:

use Python's internal pickle mechanism, or
use the .save(fname) method native-to model classes in the Gensim package, which uses its own pickel-and-numpy-based save format. For example:

filename = 'my_doc2vec_model'
initial_model.save(filename)

Note that such saves may be spread over several related files alongside each other, all starting with the same string you provided, which should be kept together. (That is, after the code above, be sure to keep any and all files that begin with the string 'my_doc2vec_model' together.)

You'd then re-load by calling .load() on the expected model class:

reloaded_model = Doc2Vec.load(filename)

Separately: your Doc2Vec code shows a number of bad practices. Using such a low min_count=1 is almost always a bad idea, slowing training & worsening results, with this sort of algorithm. And decrementing the alpha yourself, in your own loop calling .train() multiple times, is unnecessarily complex & error-prone. Whatever template/tutorial that suggested that approach is probably a bad one.