What method will be called when executing embedding_layer(tf.constant([1, 2, 3]))-CodePudding

The following code is excerpted from the following link:

https://www.tensorflow.org/text/guide/word_embeddings

import tensorflow as tf

# Embed a 1,000 word vocabulary into 5 dimensions.
embedding_layer = tf.keras.layers.Embedding(1000, 5)
print("embedding_layer: {}".format(embedding_layer))

result = embedding_layer(tf.constant([1, 2, 3]))
print("result: {}".format(result.numpy()))

embedding_layer: <keras.layers.embeddings.Embedding object at 0x7ffb180b17f0>
result: [[-0.04678862 -0.03500976 -0.04254207 -0.0452533   0.04933525]
 [-0.0366199  -0.01814463  0.04166402  0.02388224  0.03472105]
 [ 0.02966919  0.04294082  0.00715581  0.0376732   0.00529655]]

When executing the "embedding_layer(tf.constant([1, 2, 3]))", which method will be called from the tf.keras.layers.Embedding class?

Is it calling the __init__ method?

The following code will throw the following error:

embedding_layer = tf.keras.layers.Embedding(tf.constant([1, 2, 3]))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_4537/3254652836.py in <cell line: 10>()
      8 print("result: {}".format(result.numpy()))
      9 
---> 10 embedding_layer = tf.keras.layers.Embedding(tf.constant([1, 2, 3]))

TypeError: __init__() missing 1 required positional argument: 'output_dim'

Here is the source code of Embedding:

class Embedding(Layer):
    
  def __init__(self,
               input_dim,
               output_dim,
               embeddings_initializer='uniform',
               embeddings_regularizer=None,
               activity_regularizer=None,
               embeddings_constraint=None,
               mask_zero=False,
               input_length=None,
               **kwargs):
    if 'input_shape' not in kwargs:
      if input_length:
        kwargs['input_shape'] = (input_length,)
      else:
        kwargs['input_shape'] = (None,)
    if input_dim <= 0 or output_dim <= 0:
      raise ValueError('Both `input_dim` and `output_dim` should be positive, '
                       'found input_dim {} and output_dim {}'.format(
                           input_dim, output_dim))
    if (not base_layer_utils.v2_dtype_behavior_enabled() and
        'dtype' not in kwargs):
      # In TF1, the dtype defaults to the input dtype which is typically int32,
      # so explicitly set it to floatx
      kwargs['dtype'] = backend.floatx()
    # We set autocast to False, as we do not want to cast floating- point inputs
    # to self.dtype. In call(), we cast to int32, and casting to self.dtype
    # before casting to int32 might cause the int32 values to be different due
    # to a loss of precision.
    kwargs['autocast'] = False
    super(Embedding, self).__init__(**kwargs)

    self.input_dim = input_dim
    self.output_dim = output_dim
    self.embeddings_initializer = initializers.get(embeddings_initializer)
    self.embeddings_regularizer = regularizers.get(embeddings_regularizer)
    self.activity_regularizer = regularizers.get(activity_regularizer)
    self.embeddings_constraint = constraints.get(embeddings_constraint)
    self.mask_zero = mask_zero
    self.supports_masking = mask_zero
    self.input_length = input_length

CodePudding user response：

The call method is invoked when running:

result = embedding_layer(tf.constant([1, 2, 3]))

It is important to note that an Embedding layer first needs to be initialized before being used. Internally, during __init__, the Embedding layer creates a lookup table based on the size of the vocabulary you defined and the embedding dimension (in your case 1000 and 5). Each 5-dimensional vector is drawn from a uniform distribution unless otherwise specified. I would recommend checking how the embeddings are created in the build method.