What is different between The MaxPool1D API in tensorflow 2.X and MaxPool1d in pytorch-CodePudding

I'm trying to re-implement code generated in tensorflow into pytroch, but I came across maxpooling, looked into the documentation of the two frameworks, and found that their behavior is not the same. Can someone please explain to me why they are different, and which one is more efficient (I ask this because they give a different result)?

import tensorflow
from tensorflow.keras.layers import  GlobalMaxPool1D
tf_tensor = tensorflow.random.normal([8, 6, 5])
tf_maxpool = GlobalMaxPool1D()
print("output shape : ", tf_maxpool(tf_tensor).shape)

output shape : (8, 5)

import torch
import torch.nn as nn



torch_tensor = torch.tensor(tf_tensor.numpy())
maxpool = nn.MaxPool1d(kernel_size=2)
print("output shape : ", maxpool(torch_tensor).shape)

output shape : torch.Size([8, 6, 2])

CodePudding user response：

Global maximum pooling has no window size, since it is global it considers the whole sequence. The equivalent operator is simply torch.max applied channel-wise, i.e. axis=1:

>>> maxpool = torch_tensor.max(1).values

>>> maxpool.shape
torch.Size([8, 5])

CodePudding user response：

MaxPool vs GlobalMaxPool

torch.nn.MaxPool1d pools every N adjacent values by performing max operation.

For these values:

[1, 2, 3, 4, 5, 6, 7, 8]

with kernel_size=2 as you've specified, you would get the following values:

[2, 4, 6, 8]

which means a sliding window of size 2 gets the maximum value and moves on to the next pair.

Global Pooling is a similar operation, but gets the maximum value from the whole list, as pointed out in Ivan's answer. In our case, we would simply get one 8 value.

This operation, in PyTorch, is called torch.nn.AdaptiveAvgPool1d (optionally followed by torch.nn.Flatten):

import torch

tensor = torch.randn(8, 6, 5)

global_max_pooling = torch.nn.Sequential(
    torch.nn.AdaptiveMaxPool1d(1), # (8, 6, 1) shape
    torch.nn.Flatten(), # (8, 6) after removing unnecessary 1 dimension
)

global_max_pooling(tensor) # (8, 6)

The above explanation is simplified as this operation is carried across specific dimension.

Tensorflow vs PyTorch shape difference

As one could notice, in the case of Tensorflow the output is of shape (8, 5), while in the case of PyTorch it is (8, 6).

This difference stems from different channels dimensions (see here for channels last in PyTorch), namely:

PyTorch assumes data layout of (batch, channels, sequence)
Tensorflow assumes data layout of (batch, sequence, channels) (a.k.a. channels last)

One has to permute the data in case of PyTorch to get exactly the same results:

tensor = tensor.permute(0, 2, 1) # (8, 5, 6)
global_max_pooling(tensor) # (8, 5)

Efficiency

Use torch.nn.AdaptiveAvgPool1d when you want to perform pooling with specified output size (different than 1) as it skips some unnecessary operations torch.nn.MaxPool1d performs (going over the same elements more than once, which is out of scope of this question).

In general case, when we perform global pooling both are roughly equivalent and perform the same number of operations.