output_shape of custom keras layer is None (or cannot be determined automatically)-CodePudding

I am building a custom Keras layer that it is essentially the softmax function with a base parameter which is trainable. While the layer works on its own, when placed inside a sequential model, model.summary() determines its output shape as None and model.fit() raises a presumably linked exception:

ValueError: as_list() is not defined on an unknown TensorShape.

In other custom layers (including obviously the Linear example from keras) the output shape can be determined after .build() is called. By looking at model.summary()'s source code, as well as keras.layers.Layer, there is this @property Layer.output_shape that fails to automatically determine the output shape.

Then I tried overwriting the property and manually returning the input_shape argument passed to my layer's .build() method after saving it (softmax does not change the shape of the input), but this didn't work either: If i make a call to super().output_shape before returning my value, model.summary() determines the shape as ?, while if I don't, the value may be shown seemingly correct, but in both cases, I get the exact same error during .fit().

Is there something special about the code iside call() that prevents keras from understanding the shape of the output?
Alteratively, is there a piece of documentation I have missed?

My layer:

class B_Softmax(keras.layers.Layer):
    def __init__(self, b_init_mean=10, b_init_var=0.001):
        super(B_Softmax, self).__init__()
        self.b_init = tf.random_normal_initializer(b_init_mean, b_init_var)
        self._out_shape = None
        
    def build(self, input_shape):
        self.b = tf.Variable(
            initial_value = self.b_init(shape=(1,), dtype='float32'),
            trainable=True
        )
        self._out_shape = input_shape

    def call(self, inputs):
        # This is an implementation of Softmax for batched inputs
        # where the factor b is added to the exponents
        nominators  = tf.math.exp(self.b * inputs)
        denominator = tf.reduce_sum(nominators, axis=1)
        denominator = tf.squeeze(denominator)
        denominator = tf.expand_dims(denominator, -1)
        s           = tf.divide(nominators, denominator)
        return s

    @property
    def output_shape(self):    # If I comment out this function, summary prints 'None'
        self.output_shape      # If I leave this line, summary prints '?' 
        return self._out_shape # If the above line is commented out, summary prints '10' (correctly)
                               # but the same error is triggered in all three cases

The layer works on its own:

>>> A     = tf.constant([[1,2,3], [7,5,6]], dtype="float32")
>>> layer = B_Softmax(1.0)
>>> layer(A)
<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0.08991686, 0.24461554, 0.6654676 ],
       [0.6654677 , 0.08991687, 0.24461551]], dtype=float32)>

But when I try to include it inside a model, the summary doesn't look right:

input_dim = 5
model = keras.Sequential([
        Dense(32, activation='relu', input_shape=(input_dim,)),
        Dense(num_classes, activation="softmax"),
        B_Softmax(1.0)
])
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_10 (Dense)            (None, 32)                192       
                                                                 
 dense_11 (Dense)            (None, 10)                330       
                                                                 
 b__softmax_18 (B_Softmax)   None  <-------------------1-------- "None", "?", or "10" (in a hacky way) may be printted           
                                                                 
=================================================================
Total params: 523
Trainable params: 523
Non-trainable params: 0

And training fails:

batch_size = 128
epochs = 15
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

ValueError: in user code:

    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 894, in train_step
        return self.compute_metrics(x, y, y_pred, sample_weight)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 987, in compute_metrics
        self.compiled_metrics.update_state(y, y_pred, sample_weight)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 480, in update_state
        self.build(y_pred, y_true)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 398, in build
        y_pred)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 526, in _get_metric_objects
        return [self._get_metric_object(m, y_t, y_p) for m in metrics]
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 526, in <listcomp>
        return [self._get_metric_object(m, y_t, y_p) for m in metrics]
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 548, in _get_metric_object
        y_p_rank = len(y_p.shape.as_list())

    ValueError: as_list() is not defined on an unknown TensorShape.

CodePudding user response：

You could implement the compute_output_shape method in your Layer subclass:

def compute_output_shape(self, input_shape):
    return [(None, out_shape)]

Where out_shape contains the dimensionality of the output, or you can replace the whole tuple to have any output shape you want.

CodePudding user response：

This doesn't directly solves the issue, rather side-stepping it: Instead of using squeeze and expand_dims, the former of which seems to be problematic for Tensorflow to keep track of, we use keepdims=True in the summation to keep the axes aligned correctly for the softmax denominator.

def call(self, inputs):
        # This is an implementation of Softmax for batched inputs
        # where the factor b is added to the exponents
        nominators  = tf.math.exp(self.b * inputs)
        denominator = tf.reduce_sum(nominators, axis=1, keepdims=True)
        s           = tf.divide(nominators, denominator)
        return s

Arguably, it would be much preferable to make use of the built-in softmax:

def call(self, inputs):
        return tf.nn.softmax(self.b * inputs)

CodePudding user response：

I found no problem with the codes, I think input parameters are important there are some remarks done this way:

I use the dataset and the model.fit() I also use it without splits that is because I create a sample with one record.
Model input, I modified from Input( 5, ) to Input ( 1, 5 ) that matches the dataset creates a shape ( that is why I added choice number 1 into the summary )
Categorize, BatchSize, Number of class, LossFN, and Optimizers are adjusted by the output dimensions of the networks by number_classes parameters.

Sample: << Loss FN does not decisions the model but Input and Output does, not necessary tell how it created removed what it does not use or comment it >>

import tensorflow as tf

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Class / Definition
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
class B_Softmax(tf.keras.layers.Layer):
    def __init__(self, b_init_mean=10, b_init_var=0.001):
        super(B_Softmax, self).__init__()
        self.b_init = tf.random_normal_initializer(b_init_mean, b_init_var)
        self._out_shape = None
        
    def build(self, input_shape):
        self.b = tf.Variable(
            initial_value = self.b_init(shape=(1,), dtype='float32'),
            trainable=True
        )
        self._out_shape = input_shape

    def call(self, inputs):
        # This is an implementation of Softmax for batched inputs
        # where the factor b is added to the exponents
        nominators  = tf.math.exp(self.b * inputs)
        denominator = tf.reduce_sum(nominators, axis=1)
        denominator = tf.squeeze(denominator)
        denominator = tf.expand_dims(denominator, -1)
        s           = tf.divide(nominators, denominator)
        return s

    @property
    def output_shape(self):    # If I comment out this function, summary prints 'None'
        self.output_shape      # If I leave this line, summary prints '?' 
        return self._out_shape # If the above line is commented out, summary prints '10' (correctly)
                               # but the same error is triggered in all three cases

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""                              
A = tf.constant([[1,2,3], [7,5,6]], dtype="float32")

batch_size = 128
epochs = 15
input_dim = 5
num_classes = 1

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Dataset
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
start = 3
limit = 16
delta = 3
sample = tf.range( start, limit, delta )
sample = tf.cast( sample, dtype=tf.float32 )
sample = tf.constant( sample, shape=( 1, 1, 1, 5 ) )
dataset = tf.data.Dataset.from_tensor_slices(( sample, tf.constant( [0], shape=( 1, 1, 1, 1 ), dtype=tf.int64)))

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""   
layer = B_Softmax(1.0)
print( layer(A) )

model = tf.keras.Sequential([
        tf.keras.layers.Dense(32, activation='relu', input_shape=(1, input_dim)),
        tf.keras.layers.Dense(num_classes, activation="softmax"),
        B_Softmax(1.0)
])
model.summary()

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Working
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""   
model.fit(dataset, batch_size=batch_size, epochs=epochs, validation_data=dataset)

Output:

tf.Tensor(
[[0.09016796 0.24486478 0.6649673 ]
 [0.6649673  0.09016798 0.24486472]], shape=(2, 3), dtype=float32)
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 dense (Dense)               (None, 1, 32)             192

 dense_1 (Dense)             (None, 1, 1)              33

 b__softmax_1 (B_Softmax)    ?                         1

=================================================================
Total params: 226
Trainable params: 226
Non-trainable params: 0
_________________________________________________________________
Epoch 1/15
1/1 [==============================] - 4s 4s/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 2/15
1/1 [==============================] - 0s 13ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 3/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 4/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 5/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 6/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 7/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 8/15
1/1 [==============================] - 0s 13ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 9/15
1/1 [==============================] - 0s 12ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 10/15
1/1 [==============================] - 0s 15ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 11/15
1/1 [==============================] - 0s 14ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 12/15
1/1 [==============================] - 0s 15ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 13/15
1/1 [==============================] - 0s 15ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 14/15
1/1 [==============================] - 0s 13ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00
Epoch 15/15
1/1 [==============================] - 0s 13ms/step - loss: 0.0000e 00 - accuracy: 0.0000e 00 - val_loss: 0.0000e 00 - val_accuracy: 0.0000e 00

C:\Python310>