How can a Keras model train successfully, but then complain about incompatible layers afterwards?-CodePudding

I am using a simple model with three layers:

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(6,), name="flatten"),
    tf.keras.layers.Dense(128, activation="relu", name="dense1"),
    tf.keras.layers.Dense(1, name="dense2")
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.MeanAbsoluteError()
)

Okay, this compiles successfully. I already have some data prepared for it, let's check:

print(features)
print(labels)

This prints a two lists:

[[1.0, 0.6747252747252748, 0.5652173913043478, 0.6817120622568094, 0.48387096774193544, 0.8536585365853658], [1.0, 0.7692307692307693, 0.717391304347826, 0.7184824902723735, 0.4637096774193548, 0.8536585365853658], (many more features...)]
[18.0, 15.0, (many more labels, same amount as features...)]

Great. Now I'll train the model and print the history of losses:

print(
    model.fit(
        features,
        labels,
        verbose=0,
        epochs=100,
        validation_data=(features, labels)
    ).history["val_loss"]
)

This prints:

[22.92747688293457, 22.328025817871094, (many more epochs...), 3.36980938911438, 3.3660128116607666]

Great, the training has succeeded and the loss has gone down over time. Now I want to invoke the model manually:

print(
    model(
        features[0]
    )
)

But this complains:

ValueError: Layer "sequential" expects 1 input(s), but it received 6 input tensors. Inputs received: [<tf.Tensor: shape=(), dtype=float32, numpy=1.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.6747253>, <tf.Tensor: shape=(), dtype=float32, numpy=0.5652174>, <tf.Tensor: shape=(), dtype=float32, numpy=0.6817121>, <tf.Tensor: shape=(), dtype=float32, numpy=0.48387095>, <tf.Tensor: shape=(), dtype=float32, numpy=0.85365856>]

I don't see why I shouldn't be able to pass it as a list, given that that was okay in the .fit call, but with some reading and trial and error I found the solution to this using tf.constant:

print(
    model(
        tf.constant(features[0])
    )
)

But now it hits another error!

ValueError: Exception encountered when calling layer "sequential" (type Sequential).

Input 0 of layer "dense1" is incompatible with the layer: expected axis -1 of input shape to have value 6, but received input with shape (6, 1)

Call arguments received:
  • inputs=tf.Tensor(shape=(6,), dtype=float32)
  • training=None
  • mask=None

Seems like the second layer is somehow incompatible with the first one! What does that mean? What I absolutely don't understand is, if the layers are incompatible, how did this compile to begin with? And much worse, why did the training succeed? Clearly, if the model compiles without complaining, and I pass the input to the first layer fine, there couldn't possibly be a problem at the second layer?

What's going wrong here? To me this doesn't seem logical. There must be something I missed.

CodePudding user response：

First of all why are you using a Flatten layer? You can take the flatten layer out and use only

model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation="relu", name="dense1", input_shape=(6,)),
    tf.keras.layers.Dense(1, name="dense2")
])

And look that we've passed as input_shape a tuple (6,) which is the same as (6, None). If we run model.summary() we'll get

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense1 (Dense)              (None, 128)               896       
                                                                 
 dense2 (Dense)              (None, 1)                 129       
                                                                 
=================================================================
Total params: 1,025
Trainable params: 1,025
Non-trainable params: 0
_________________________________________________________________

Now we can normally fit with

features = [[1.0, 0.6747252747252748, 0.5652173913043478, 0.6817120622568094, 0.48387096774193544, 0.8536585365853658], [1.0, 0.7692307692307693, 0.717391304347826, 0.7184824902723735, 0.4637096774193548, 0.8536585365853658]]
labels = [18.0, 15.0]

history = model.fit(
  features,
  labels,
  verbose=0,
  epochs=100,
  validation_data=(features, labels)
)

Now if you want to make a prediction you can do

# Using .predict 
model.predict(features)
>>> array([[15.692635], [16.437447]], dtype=float32)
model.predict([features[0]])
>>> array([[15.692635]], dtype=float32)
# Using functional way
model(np.array(features))
>>><tf.Tensor: shape=(2, 1), dtype=float32, numpy=array([[15.692635],[16.437447]],dtype=float32)>
model(np.array(features[0]).reshape(1,-1))
>>> <tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[15.692635]], dtype=float32)>

This different between the .predict and using the model itself as a function must be due to different implementation on the predict method and the __call__ from the Model class.

It seems that the predict method is more flexible and might do some modifications on the input to use it to make the prediction whereas the functional method might try to use the input as it is, thus we need to pass it as a 2D array