The following code is excerpted from the following link:
https://www.tensorflow.org/text/guide/word_embeddings
import tensorflow as tf
# Embed a 1,000 word vocabulary into 5 dimensions.
embedding_layer = tf.keras.layers.Embedding(1000, 5)
print("embedding_layer: {}".format(embedding_layer))
result = embedding_layer(tf.constant([1, 2, 3]))
print("result: {}".format(result.numpy()))
embedding_layer: <keras.layers.embeddings.Embedding object at 0x7ffb180b17f0>
result: [[-0.04678862 -0.03500976 -0.04254207 -0.0452533 0.04933525]
[-0.0366199 -0.01814463 0.04166402 0.02388224 0.03472105]
[ 0.02966919 0.04294082 0.00715581 0.0376732 0.00529655]]
When executing the "embedding_layer(tf.constant([1, 2, 3]))
", which method will be called from the tf.keras.layers.Embedding class?
Is it calling the __init__
method?
The following code will throw the following error:
embedding_layer = tf.keras.layers.Embedding(tf.constant([1, 2, 3]))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_4537/3254652836.py in <cell line: 10>()
8 print("result: {}".format(result.numpy()))
9
---> 10 embedding_layer = tf.keras.layers.Embedding(tf.constant([1, 2, 3]))
TypeError: __init__() missing 1 required positional argument: 'output_dim'
Here is the source code of Embedding:
class Embedding(Layer):
def __init__(self,
input_dim,
output_dim,
embeddings_initializer='uniform',
embeddings_regularizer=None,
activity_regularizer=None,
embeddings_constraint=None,
mask_zero=False,
input_length=None,
**kwargs):
if 'input_shape' not in kwargs:
if input_length:
kwargs['input_shape'] = (input_length,)
else:
kwargs['input_shape'] = (None,)
if input_dim <= 0 or output_dim <= 0:
raise ValueError('Both `input_dim` and `output_dim` should be positive, '
'found input_dim {} and output_dim {}'.format(
input_dim, output_dim))
if (not base_layer_utils.v2_dtype_behavior_enabled() and
'dtype' not in kwargs):
# In TF1, the dtype defaults to the input dtype which is typically int32,
# so explicitly set it to floatx
kwargs['dtype'] = backend.floatx()
# We set autocast to False, as we do not want to cast floating- point inputs
# to self.dtype. In call(), we cast to int32, and casting to self.dtype
# before casting to int32 might cause the int32 values to be different due
# to a loss of precision.
kwargs['autocast'] = False
super(Embedding, self).__init__(**kwargs)
self.input_dim = input_dim
self.output_dim = output_dim
self.embeddings_initializer = initializers.get(embeddings_initializer)
self.embeddings_regularizer = regularizers.get(embeddings_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.embeddings_constraint = constraints.get(embeddings_constraint)
self.mask_zero = mask_zero
self.supports_masking = mask_zero
self.input_length = input_length
CodePudding user response:
The call method is invoked when running:
result = embedding_layer(tf.constant([1, 2, 3]))
It is important to note that an Embedding
layer first needs to be initialized before being used. Internally, during __init__
, the Embedding
layer creates a lookup table based on the size of the vocabulary you defined and the embedding dimension (in your case 1000 and 5). Each 5-dimensional vector is drawn from a uniform distribution unless otherwise specified. I would recommend checking how the embeddings are created in the build method.