tensorflow InvalidArgumentError indices[0,x = y is not in [0, z)-CodePudding

I am using tensorflow.keras to predicate what an email is about using the email sender, subject and content.

        # Use tokenizers to change email data to model input series
        subject_sequence = subject_tk.texts_to_sequences(subject_series)
        subject_sequence = sequence.pad_sequences(subject_sequence, maxlen = subject_length)
        sender_sequence = subject_tk.texts_to_sequences(sender_series)
        sender_sequence = sequence.pad_sequences(sender_sequence, maxlen = sender_length)
        body_sequence = body_tk.texts_to_sequences(body_series)
        body_sequence = sequence.pad_sequences(body_sequence, maxlen = body_length)   

       # Run learning model on input series and make predication 
        predication  = email_classification_model.predict([subject_sequence, sender_sequence , body_sequence])
        print(predication)

However, I noticed that sometime, like 10%, the model would fail with the following error:

  File "mailMonitory.py", line 102, in OnItemAdd
    predication  = email_classification_model.predict([subject_sequence, sender_sequence , body_sequence])
....
   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
   tensorflow.python.framework.errors_impl.InvalidArgumentError:  indices[0,253] = 3686 is not in [0, 1897)
         [[node model/embedding_1/embedding_lookup (defined at mailMonitory.py:102) ]] [Op:__inference_predict_function_4617]

print(sender_sequence)
[[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
   693 3686  139  169]]

From my testing, the problem is always my tokenizer converts the sender email into a series that contains a out of bound number for the model. Why is this happening? Does my tokenizer not contain enough data or is there something wrong with my mode? How can I fix this?

CodePudding user response：

You usually get this error when you are feeding integer values to your Embedding layer, which are beyond the size of the defined input_dim. For example, the first sequence works because all integer values are < input_dim. The second sequence throws an exception because almost all values are outside the range of possible integers:

import tensorflow as tf

input = tf.keras.layers.Input(shape=(5,))
output = tf.keras.layers.Embedding(input_dim=10, output_dim=5)(input)
model = tf.keras.models.Model(input,output)

print(model(tf.constant([1, 5, 2, 6, 8])))
print(model(tf.constant([1, 12, 18, 19, 10, 4000])))

tf.Tensor(
[[-0.03517901  0.01769676  0.01823583  0.01846877 -0.01214858]
 [-0.04662237 -0.01376029  0.04361605  0.0426343  -0.01796628]
 [ 0.020581    0.02564194  0.00014243  0.03558977  0.01154976]
 [-0.01251727  0.00095896  0.00218729 -0.01606169  0.02248188]
 [ 0.03368715  0.01532438 -0.01821761  0.00139984  0.00360139]], shape=(5, 5), dtype=float32)
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-3-cd27383a1b70> in <module>()
      6 
      7 print(model(tf.constant([1, 5, 2, 6, 8])))
----> 8 print(model(tf.constant([1, 12, 18, 19, 10, 4000])))
...

InvalidArgumentError: Exception encountered when calling layer "embedding_1" (type Embedding).

indices[1] = 12 is not in [0, 10) [Op:ResourceGather]

Call arguments received:
  • inputs=tf.Tensor(shape=(6,), dtype=float32)

So, the solution is to make sure you are using the correct size for the input_dim parameter.