I am wondering if I succeeded in translating the following definition in PyTorch
to Keras
?
In PyTorch
, the following multi-layer perceptron was defined:
from torch import nn
hidden = 128
def mlp(size_in, size_out, act=nn.ReLU):
return nn.Sequential(
nn.Linear(size_in, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, size_out),
)
My translation is
from tensorflow import keras
from keras import layers
hidden = 128
def mlp(size_in, size_out, act=keras.layers.ReLU):
return keras.Sequential(
[
layers.Dense(hidden, activation=None, name="layer1", input_shape=(size_in, 1)),
act(),
layers.Dense(hidden, activation=None, name="layer2", input_shape=(hidden, 1)),
act(),
layers.Dense(hidden, activation=None, name="layer3", input_shape=(hidden, 1)),
act(),
layers.Dense(size_out, activation=None, name="layer4", input_shape=(hidden, 1))
])
I am particularly confused about the input/output arguments, because that seems to be where tensorflow and PyTorch differ.
From the documentation:
When a popular kwarg input_shape is passed, then keras will create an input layer to insert before the current layer. This can be treated equivalent to explicitly defining an InputLayer.
So, did I get it right?
Thank you so much!
CodePudding user response:
In Keras, you should provide a input_shape
for the first layer or alternatively use the layers.Input
layer. This is required because Keras compiles the model beforehand and in doing so it needs to have the input shape whereas PyTorch does so at runtime.
def keras_mlp(size_in, size_out, act=layers.ReLU):
return keras.Sequential([layers.Input(shape=(size_in,)),
layers.Dense(hidden, name='layer1'),
act(),
layers.Dense(hidden, name='layer2'),
act(),
layers.Dense(hidden, name='layer3'),
act(),
layers.Dense(size_out, name='layer4')])
def pytorch_mlp(size_in, size_out, act=nn.ReLU):
return nn.Sequential(nn.Linear(size_in, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, hidden),
act(),
nn.Linear(hidden, size_out))
You can compare their summary.
For Keras:
>>> keras_mlp(10, 5).summary() Model: "sequential_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= layer1 (Dense) (None, 128) 1408 re_lu_6 (ReLU) (None, 128) 0 layer2 (Dense) (None, 128) 16512 re_lu_7 (ReLU) (None, 128) 0 layer3 (Dense) (None, 128) 16512 re_lu_8 (ReLU) (None, 128) 0 layer4 (Dense) (None, 5) 645 ================================================================= Total params: 35,077 Trainable params: 35,077 Non-trainable params: 0 _________________________________________________________________
For PyTorch:
>>> summary(pytorch_mlp(10, 5), (1,10)) ============================================================================ Layer (type:depth-idx) Output Shape Param # ============================================================================ Sequential [1, 5] -- ├─Linear: 1-1 [1, 128] 1,408 ├─ReLU: 1-2 [1, 128] -- ├─Linear: 1-3 [1, 128] 16,512 ├─ReLU: 1-4 [1, 128] -- ├─Linear: 1-5 [1, 128] 16,512 ├─ReLU: 1-6 [1, 128] -- ├─Linear: 1-7 [1, 5] 645 ============================================================================ Total params: 35,077 Trainable params: 35,077 Non-trainable params: 0 Total mult-adds (M): 0.04 ============================================================================ Input size (MB): 0.00 Forward/backward pass size (MB): 0.00 Params size (MB): 0.14 Estimated Total Size (MB): 0.14 ============================================================================