BERT outputs explained-CodePudding

The keys of the BERT encoder's output are default, encoder_outputs, pooled_output and sequence_output

As far as I can know, encoder_outputs are the output of each encoder, pooled_output is the output of the global context and sequence_output is the output context of each token (correct me if I'm wrong please). But what is default? Can you give me a more detailed explanation of each one?

This is the link to the encoder

CodePudding user response：

The Tensorflow docs provide a very good explanation to the outputs you are asking about:

The BERT models return a map with 3 important keys: pooled_output, sequence_output, encoder_outputs:

pooled_output represents each input sequence as a whole. The shape is [batch_size, H]. You can think of this as an embedding for the entire movie review.

sequence_output represents each input token in the context. The shape is [batch_size, seq_length, H]. You can think of this as a contextual embedding for every token in the movie review.

encoder_outputs are the intermediate activations of the L Transformer blocks. outputs["encoder_outputs"][i] is a Tensor of shape [batch_size, seq_length, 1024] with the outputs of the i-th Transformer block, for 0 <= i < L. The last value of the list is equal to sequence_output

Here is another interesting discussion on the difference between the pooled_output and sequence_output, if you are interested.

The default output is equal to the pooled_output, which you can verify here:

import tensorflow as tf
import tensorflow_hub as hub

tfhub_handle_preprocess = 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3'
tfhub_handle_encoder = 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1'

def build_classifier_model(name):
    text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='features')    
    bert_preprocess_model = hub.KerasLayer(tfhub_handle_preprocess, name='preprocessing')
    encoder_inputs = bert_preprocess_model(text_input)
    encoder = hub.KerasLayer(tfhub_handle_encoder) 
    outputs = encoder(encoder_inputs)
    net = outputs[name]
    return tf.keras.Model(text_input, net)

sentence = tf.constant([
"Improve the physical fitness of your goldfish by getting him a bicycle"
])

classifier_model = build_classifier_model(name='default')
default_output = classifier_model(sentence)

classifier_model = build_classifier_model(name='pooled_output')
pooled_output = classifier_model(sentence)

print(default_output == pooled_output)