model.predict() - TensorFlow Keras gives same output for all images when the dataset size increases?-CodePudding

I have been trying to use a pre-trained model(XceptionNet) to get a feature vector corresponding to each input image for a classification task. But am stuck as the model.predict() gives unreliable and varying output vector for the same image when the dataset size changes.

In the following code, batch is the data containing images and for each of these images I want a feature vector which I am obtaining using the pre-trained model.

batch.shape
TensorShape([803, 800, 600, 3])

Just to make it clear that all the input images are different here are few of the input images displayed.

plt.imshow(batch[-23])
plt.figure()
plt.imshow(batch[-15])

My model is the following

model_xception = Xception(weights="imagenet", input_shape=(*INPUT_SHAPE, 3), include_top=False)
model_xception.trainable = False
inp = Input(shape=(*INPUT_SHAPE, 3)) # INPUT_SHAPE=(800, 600)
out = model_xception(inp, training=False)
output = GlobalAvgPool2D()(out)
model = tf.keras.Model(inp, output, name='Xception-kPiece')

Now the issue is presented in the following code outputs

model.predict(batch[-25:]) # prediction on the last 25 images

1/1 [==============================] - 1s 868ms/step

array([[4.99584060e-03, 4.25433293e-02, 9.93836671e-02, ...,
        3.21301445e-03, 2.59823762e-02, 9.08260979e-03],
       [2.50613055e-04, 1.18759666e-02, 0.00000000e 00, ...,
        1.77203789e-02, 7.71604702e-02, 1.28602296e-01],
       [3.41954082e-02, 1.82092339e-02, 5.07147610e-03, ...,
        7.09404126e-02, 9.45318267e-02, 2.69510925e-01],
       ...,
       [0.00000000e 00, 5.16504236e-03, 4.90547449e-04, ...,
        4.62833559e-04, 9.43152513e-03, 1.17826145e-02],
       [0.00000000e 00, 4.64747474e-03, 0.00000000e 00, ...,
        1.21422185e-04, 4.47714329e-03, 1.92385539e-02],
       [0.00000000e 00, 1.29655155e-03, 4.02751788e-02, ...,
        0.00000000e 00, 0.00000000e 00, 3.20959717e-01]], dtype=float32)

model.predict(batch)[-25:] # prediction on entire dataset of 803 images and then extracting the vectors corresponding to the last 25 images

26/26 [==============================] - 34s 1s/step

array([[1.7320104e-05, 3.6561250e-04, 0.0000000e 00, ..., 0.0000000e 00,
        3.5924271e-02, 0.0000000e 00],
       [1.7320104e-05, 3.6561250e-04, 0.0000000e 00, ..., 0.0000000e 00,
        3.5924271e-02, 0.0000000e 00],
       [1.7320104e-05, 3.6561250e-04, 0.0000000e 00, ..., 0.0000000e 00,
        3.5924271e-02, 0.0000000e 00],
       ...,
       [1.7318112e-05, 3.6561041e-04, 0.0000000e 00, ..., 0.0000000e 00,
        3.5924841e-02, 0.0000000e 00],
       [1.7318112e-05, 3.6561041e-04, 0.0000000e 00, ..., 0.0000000e 00,
        3.5924841e-02, 0.0000000e 00],
       [1.7318112e-05, 3.6561041e-04, 0.0000000e 00, ..., 0.0000000e 00,
        3.5924841e-02, 0.0000000e 00]], dtype=float32)

There are two problems in such a behavior:

Both the outputs are not same, but the last 25 input images are same.
The output for each input image in the larger batch is same.

My take on the problem:

I feel like the BatchNormalization layers are causing the issue. But what is the fix? I am passing argument in the model_xception for training=False and also model_xception.trainable=False still the output is same for all the inputs.
The increase in number of images in the batch is the problem.
Not only for XceptionNet for all other models this issue is evident. I have also experimented with EfficientNetV2 models.

Can anyone help fix the bug?

CodePudding user response：

1 Both the outputs are not the same, but the last 25 input images are the same.

That is correct behavior even same image predicts the result is not:

1.1 learning function: the identity of the learning process, should not vary than scopes of estimated time training ( working sets input provide same output patterns )

1.2 At the output layer mapping label, significant data output example measurements, scales, zooming, alignment, contrast, 0 to 1 input data mapping, networks type, letters collaboration etc.

2 The output for each input image in the larger batch is the same.

Try to change the data input does it provide the correct results with the same correctness⁉️
Global average, how many percent of characters on the page, Convolution layers, Normalize layer at the earlier step⁉️
Training or not result not same, predict with trained model scopes data provide better results, that can create output unstably.

3 The increase in the number of images in the batch is the problem.

Using the callback function you can limit the acceptable ranges with criteria.

4 Not only for XceptionNet but for all other models this issue is evident. I have also experimented with EfficientNetV2 models.

It should work, number of options output or use other output layer function.

Picture is much easy see what it different than text letters they are boundary information, input as text letter see the output from earlier normalize layer.

CodePudding user response：

The issue seems to be appearing cause I am using tensorflow-macos which has this major bug of predictions which are wrong for exceeding a particular number of input images.

See the issue in action below:

When 57 input images are used then the predictions are different and same as 56, ..., 1 input image (which is consistent behavior and as expected).

model.predict(batch[-57:])

1/1 [==============================] - 2s 2s/step

array([[0.00000000e 00, 2.56574154e-02, 1.79693177e-01, ...,
        2.85670068e-03, 1.08444700e-02, 2.34257965e-03],
       [0.00000000e 00, 1.28444552e-03, 0.00000000e 00, ...,
        4.11680201e-03, 4.49061068e-03, 1.83695972e-01],
       [0.00000000e 00, 2.29660165e-03, 7.84890354e-03, ...,
        1.86224483e-04, 1.81426702e-03, 1.54079705e-01],
       ...,
       [0.00000000e 00, 5.16504236e-03, 4.90547449e-04, ...,
        4.62833559e-04, 9.43152513e-03, 1.17826145e-02],
       [0.00000000e 00, 4.64747474e-03, 0.00000000e 00, ...,
        1.21422185e-04, 4.47714329e-03, 1.92385539e-02],
       [0.00000000e 00, 1.29655155e-03, 4.02751788e-02, ...,
        0.00000000e 00, 0.00000000e 00, 3.20959717e-01]], dtype=float32)

model.predict(batch[-55:])

2/2 [==============================] - 2s 1s/step

array([[0.00000000e 00, 2.29660165e-03, 7.84890354e-03, ...,
        1.86224483e-04, 1.81426702e-03, 1.54079705e-01],
       [4.94572960e-05, 8.04292504e-04, 5.08825444e-02, ...,
        4.58029518e-03, 2.09121332e-02, 5.57549708e-02],
       [0.00000000e 00, 1.62312540e-03, 0.00000000e 00, ...,
        4.35817856e-05, 2.16606092e-02, 1.30677417e-01],
       ...,
       [0.00000000e 00, 5.16504236e-03, 4.90547449e-04, ...,
        4.62833559e-04, 9.43152513e-03, 1.17826145e-02],
       [0.00000000e 00, 4.64747474e-03, 0.00000000e 00, ...,
        1.21422185e-04, 4.47714329e-03, 1.92385539e-02],
       [0.00000000e 00, 1.29655155e-03, 4.02751788e-02, ...,
        0.00000000e 00, 0.00000000e 00, 3.20959717e-01]], dtype=float32)

But when the input images is changed to 58 or more there is the above mentioned issue.

model.predict(batch[-58:])

1/1 [==============================] - 2s 2s/step

array([[5.3905282e-04, 2.8516021e-02, 1.2775734e-03, ..., 5.4674568e-03,
        1.7451918e-02, 9.4717339e-02],
       [0.0000000e 00, 2.8345605e-02, 1.2786543e-03, ..., 0.0000000e 00,
        2.4870334e-03, 1.2716405e-01],
       [4.3588653e-03, 8.2868971e-02, 1.8764129e-02, ..., 2.5320805e-03,
        5.9973758e-02, 6.9927111e-02],
       ...,
       [1.7320104e-05, 3.6561250e-04, 0.0000000e 00, ..., 0.0000000e 00,
        3.5924271e-02, 0.0000000e 00],
       [1.7320104e-05, 3.6561250e-04, 0.0000000e 00, ..., 0.0000000e 00,
        3.5924271e-02, 0.0000000e 00],
       [1.7320104e-05, 3.6561250e-04, 0.0000000e 00, ..., 0.0000000e 00,
        3.5924271e-02, 0.0000000e 00]], dtype=float32)

If anyone could suggest a fix or workaround while still using tensorflow on mac it would be really helpful.

There is also a github issue which is still not fixed here.