I'm having trouble understanding what a 1D global average pooling does to an embedding layer. I know that embedding layers are like lookup tables. If I have tf.keras.layers.Embedding(vocab_size=30, embedding_dim=7, input_length=10)
, is the output after feed forwarding a matrix of 10 rows x 7 columns
or a 3D tensor of 1 row x 7 columns x 10 length
?
If it's 10 rows x 7 columns
, does it take the average of each row and output a single vector of shape 10 row x 1 columns
?
If it's 1 row x 7 columns x 10 length
, does it take the average of each vector and output a single vector also in shape 10 row x 1 columns
?
CodePudding user response:
To your first question: What's the output of an Embedding layer in tensorflow?
The Embedding
layer maps each integer value in a sequence that represents a unique word in the vocabulary to a 7-dimensional vector. In the following example, you have two sequences with 10 integer values each. These integer values can range from 0 to 30, where 30 is the size of the vocabulary. Each integer value of each sequence is mapped to a 7-dimensional vector, resulting in the output shape (2, 10, 7)
, where 2 is the number of samples, 10 is the sequence length and 7 is the dimension of each integer value:
import tensorflow as tf
samples = 2
texts = tf.random.uniform((samples, 10), maxval=30, dtype=tf.int32)
embedding_layer = tf.keras.layers.Embedding(30, 7, input_length=10)
print(embedding_layer(texts))
tf.Tensor(
[[[ 0.0225671 0.02347589 0.00979777 0.00041901 -0.00628462
0.02810872 -0.00962182]
[-0.00848696 -0.04342243 -0.02836052 -0.00517335 -0.0061365
-0.03012114 0.01677728]
[ 0.03311044 0.00556745 -0.00702027 0.03381392 -0.04623893
0.04987461 -0.04816799]
[-0.03521906 0.0379228 0.03005264 -0.0020758 -0.0384485
0.04822161 -0.02092661]
[-0.03521906 0.0379228 0.03005264 -0.0020758 -0.0384485
0.04822161 -0.02092661]
[-0.01790254 -0.0175228 -0.01194855 -0.02171307 -0.0059397
0.02812174 0.01709754]
[ 0.03117083 0.03501941 0.01058724 0.0452967 -0.03717183
-0.04691924 0.04459465]
[-0.0225444 0.01631368 -0.04825303 0.02976335 0.03874404
0.01886607 -0.04535152]
[-0.01405543 -0.01035894 -0.01828993 0.01214089 -0.0163126
0.00249451 -0.03320551]
[-0.00536104 0.04976835 0.03676006 -0.04985759 -0.04882429
0.04079831 -0.04694915]]
[[ 0.02474061 0.04651412 0.01263839 0.02834389 0.01770737
0.027616 0.0391163 ]
[-0.00848696 -0.04342243 -0.02836052 -0.00517335 -0.0061365
-0.03012114 0.01677728]
[-0.02423838 0.00046005 0.01264722 -0.00118362 -0.04956226
-0.00222496 0.00678415]
[ 0.02132202 0.02490019 0.015528 0.01769954 0.03830704
-0.03469712 -0.00817447]
[-0.03713315 -0.01064591 0.0106518 -0.00899752 -0.04772154
0.03767705 -0.02580358]
[ 0.02132202 0.02490019 0.015528 0.01769954 0.03830704
-0.03469712 -0.00817447]
[ 0.00416059 -0.03158562 0.00862025 -0.03387908 0.02394537
-0.00088609 0.01963869]
[-0.0454465 0.03087567 -0.01201812 -0.02580545 0.02585572
-0.00974055 -0.02253721]
[-0.00438716 0.03688161 0.04575384 -0.01561296 -0.0137012
-0.00927494 -0.02183568]
[ 0.0225671 0.02347589 0.00979777 0.00041901 -0.00628462
0.02810872 -0.00962182]]], shape=(2, 10, 7), dtype=float32)
When working with text data, the output of an Embedding
layer would be 2 sentences consisting of 10 words each, where each word is mapped to a 7-dimensional vector.
If you are wondering where these random numbers for each integer in each sequence come from, by default the Embedding
layer uses a uniform distribution to generate these values.
To your second question: What does a 1D global average pooling do to an Embedding layer?
The layer GlobalAveragePooling1D
does nothing more than simply calculate the average over a given dimension in a tensor. The following example calculates the average of the 7 numbers representing a word in each sequence and returns a scalar for each word, resulting in the output shape (2, 10)
, where 2 is the number of samples (sentences) and 10 represents the average values for of each word. This is equivalent to simply doing tf.reduce_mean(embedding_layer(texts), axis=-1)
.
import tensorflow as tf
samples = 2
texts = tf.random.uniform((samples, 10), maxval=30, dtype=tf.int32)
embedding_layer = tf.keras.layers.Embedding(30, 7, input_length=10)
average_layer = tf.keras.layers.GlobalAveragePooling1D(data_format = "channels_first")
print(average_layer(embedding_layer(texts)))
CodePudding user response:
GlobalAveragePooling1D
reduces the dimension of a matrix by taking the average along values of some dimension.
From the keras documentation this layer has a data_format
argument. By default it is "channels_last"
meaning that it will keep the last channel, and take the average along the other.
Here is an example model:
model = Sequential([
Input((10)),
Embedding(30, 7, input_length=10),
GlobalAveragePooling1D()
])
model.summary()
output:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 10, 7) 210
global_average_pooling1d (G (None, 7) 0
lobalAveragePooling1D)
=================================================================
Total params: 210
Trainable params: 210
Non-trainable params: 0
_________________________________________________________________
As you can see, the dimension for a sample was reduced from (10, 7) to (7), meaning it returns the average of the embeddings given.
If you set data_format = "channels_first"
, like here
model = Sequential([
Input((10)),
Embedding(30, 7, input_length=10),
GlobalAveragePooling1D(data_format = "channels_first")
])
model.summary()
output:
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 10, 7) 210
global_average_pooling1d (G (None, 10) 0
lobalAveragePooling1D)
=================================================================
Total params: 210
Trainable params: 210
Non-trainable params: 0
_________________________________________________________________
Here the dimension for a sample was reduced from (10, 7) to (10), meaning it returns the average of values in each embeddings given. What kind of doesn't make sense, since you could set the embedding_dim
to 1 and get the same result.