I am using a simple model with three layers:
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(6,), name="flatten"),
tf.keras.layers.Dense(128, activation="relu", name="dense1"),
tf.keras.layers.Dense(1, name="dense2")
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.MeanAbsoluteError()
)
Okay, this compiles successfully. I already have some data prepared for it, let's check:
print(features)
print(labels)
This prints a two lists:
[[1.0, 0.6747252747252748, 0.5652173913043478, 0.6817120622568094, 0.48387096774193544, 0.8536585365853658], [1.0, 0.7692307692307693, 0.717391304347826, 0.7184824902723735, 0.4637096774193548, 0.8536585365853658], (many more features...)]
[18.0, 15.0, (many more labels, same amount as features...)]
Great. Now I'll train the model and print the history of losses:
print(
model.fit(
features,
labels,
verbose=0,
epochs=100,
validation_data=(features, labels)
).history["val_loss"]
)
This prints:
[22.92747688293457, 22.328025817871094, (many more epochs...), 3.36980938911438, 3.3660128116607666]
Great, the training has succeeded and the loss has gone down over time. Now I want to invoke the model manually:
print(
model(
features[0]
)
)
But this complains:
ValueError: Layer "sequential" expects 1 input(s), but it received 6 input tensors. Inputs received: [<tf.Tensor: shape=(), dtype=float32, numpy=1.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.6747253>, <tf.Tensor: shape=(), dtype=float32, numpy=0.5652174>, <tf.Tensor: shape=(), dtype=float32, numpy=0.6817121>, <tf.Tensor: shape=(), dtype=float32, numpy=0.48387095>, <tf.Tensor: shape=(), dtype=float32, numpy=0.85365856>]
I don't see why I shouldn't be able to pass it as a list, given that that was okay in the .fit
call, but with some reading and trial and error I found the solution to this using tf.constant
:
print(
model(
tf.constant(features[0])
)
)
But now it hits another error!
ValueError: Exception encountered when calling layer "sequential" (type Sequential).
Input 0 of layer "dense1" is incompatible with the layer: expected axis -1 of input shape to have value 6, but received input with shape (6, 1)
Call arguments received:
• inputs=tf.Tensor(shape=(6,), dtype=float32)
• training=None
• mask=None
Seems like the second layer is somehow incompatible with the first one! What does that mean? What I absolutely don't understand is, if the layers are incompatible, how did this compile to begin with? And much worse, why did the training succeed? Clearly, if the model compiles without complaining, and I pass the input to the first layer fine, there couldn't possibly be a problem at the second layer?
What's going wrong here? To me this doesn't seem logical. There must be something I missed.
CodePudding user response:
First of all why are you using a Flatten layer? You can take the flatten layer out and use only
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation="relu", name="dense1", input_shape=(6,)),
tf.keras.layers.Dense(1, name="dense2")
])
And look that we've passed as input_shape a tuple (6,)
which is the same as (6, None)
. If we run model.summary()
we'll get
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense1 (Dense) (None, 128) 896
dense2 (Dense) (None, 1) 129
=================================================================
Total params: 1,025
Trainable params: 1,025
Non-trainable params: 0
_________________________________________________________________
Now we can normally fit with
features = [[1.0, 0.6747252747252748, 0.5652173913043478, 0.6817120622568094, 0.48387096774193544, 0.8536585365853658], [1.0, 0.7692307692307693, 0.717391304347826, 0.7184824902723735, 0.4637096774193548, 0.8536585365853658]]
labels = [18.0, 15.0]
history = model.fit(
features,
labels,
verbose=0,
epochs=100,
validation_data=(features, labels)
)
Now if you want to make a prediction you can do
# Using .predict
model.predict(features)
>>> array([[15.692635], [16.437447]], dtype=float32)
model.predict([features[0]])
>>> array([[15.692635]], dtype=float32)
# Using functional way
model(np.array(features))
>>><tf.Tensor: shape=(2, 1), dtype=float32, numpy=array([[15.692635],[16.437447]],dtype=float32)>
model(np.array(features[0]).reshape(1,-1))
>>> <tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[15.692635]], dtype=float32)>
This different between the .predict and using the model itself as a function must be due to different implementation on the predict
method and the __call__
from the Model class.
It seems that the predict method is more flexible and might do some modifications on the input to use it to make the prediction whereas the functional method might try to use the input as it is, thus we need to pass it as a 2D array