Home > Mobile >  Use three transformations (average, max, min) of pretrained embeddings to a single output layer in P
Use three transformations (average, max, min) of pretrained embeddings to a single output layer in P

Time:12-21

I have developed a trivial Feed Forward neural network with Pytorch.

The neural network uses GloVe pre-trained embeddings in a freezed nn.Embeddings layer.

Next, the embedding layer splits into three embeddings. Each split is a different transformation applied to the initial embedding layer. Then the embeddings layer feed three nn.Linear layers. And finally I have a single output layer for a binary classification target.

The shape of the embedding tensor is [64,150,50]
-> 64: sentences in the batch,
-> 150: words per sentence,
-> 50: vector-size of a single word (pre-trained GloVe vector)

So after the transformation, the embedding layer splits into three layers with shape [64,50], where 50 = either the torch.mean(), torch.max() or torch.min() of the 150 words per sentence.

My questions are:

  1. How could I feed the output layer from three different nn.Linear layers to predict a single target value [0,1].

  2. Is this efficient and helpful to the total predictive power of the model? Or just selecting the average of the embeddings is sufficient and no improvement will be observed.

The forward() method of my PyTorch model is:

  def forward(self, text):

    embedded = self.embedding(text)
    if self.use_pretrained_embeddings:
      embedded_average = torch.mean(embedded, dim=1)
      embedded_max = torch.max(embedded, dim=1)[0]
      embedded_min = torch.min(embedded, dim=1)[0]
    else:
      embedded = self.flatten_layer(embedded)

    input_layer = self.input_layer(embedded_average) #each Linear layer has the same value of hidden unit
    input_layer = self.activation(input_layer)

    input_layer_max = self.input_layer(embedded_max)
    input_layer_max = self.activation(input_layer_max)

    input_layer_min = self.input_layer(embedded_min)
    input_layer_min = self.activation(input_layer_min)
    
    #What should I do here? to exploit the weights of the 3 hidden layers
    output_layer = self.output_layer(input_layer)
    output_layer = self.activation_output(output_layer) #Sigmoid()
    
    return output_layer

After the proposed answer the function is:

  def forward(self, text):

    embedded = self.embedding(text)
    if self.use_pretrained_embeddings:
      embedded_average = torch.mean(embedded, dim=1)
      embedded_max = torch.max(embedded, dim=1)[0]
      embedded_min = torch.min(embedded, dim=1)[0]

      #use of average embeddings transformation
      input_layer_average = self.input_layer(embedded_average)
      input_layer_average = self.activation(input_layer_average)
      
      #use of max embeddings transformation
      input_layer_max = self.input_layer(embedded_max)
      input_layer_max = self.activation(input_layer_max)

      #use of min embeddings transformation
      input_layer_min = self.input_layer(embedded_min)
      input_layer_min = self.activation(input_layer_min)

    else:
      embedded = self.flatten_layer(embedded)

    input_layer = torch.concat([input_layer_average, input_layer_max, input_layer_min], dim=1)
    input_layer = self.activation(input_layer)

    print("3",input_layer.shape) #[192,1] vs [64,1] -> output layer

    if self.n_layers !=0:
      for layer in self.layers:
          input_layer = layer(input_layer)

    output_layer = self.output_layer(input_layer)
    output_layer = self.activation_output(output_layer)
    
    return output_layer

This generates the following error:

ValueError: Using a target size (torch.Size([64, 1])) that is different to the input size (torch.Size([192, 1])) is deprecated. Please ensure they have the same size.

Expected outcome since the concatenated layer is 3x the size of the sentences (64). Any fix that could resolve it?

CodePudding user response:

Regarding 1: You can use torch.concat to concatenate the outputs along the appropriate dimension, and then e.g. map them to a single output using another linear layer.

Regarding 2: You will have to try it yourself and see whether this is useful.

  • Related