SSD-Mobilenetv2 300x300 - Tensorflow objection detection API-CodePudding

I have fine-tuned an SSD-Mobilenetv2 with train config fixed resize 300x300 built using tensorflow objection detection API and saved in TF Saved_Model format. Questions:

How, during inference, is it able to accept input images of any shape (and not just 300x300) without the need for any preprocessing to resize them to 300x300 first and then pass them to the model?
Does it is because saved_model by default does resize during inference? (If yes, does it also normalize them because before doing convolution operations) (I am new to saved_model format but I think it is not because of saved_model, but then how is it possible - as I think SSD-Mobilenet includes FC layers which require fixed input size) OR does the architecture use AdaptivePooling in b/w to achieve this?

CodePudding user response：

When you do predictions, you must use images of the SAME size as the model was trained on. So if you converted your 300 X 300 images to 224 X 224 you must do the same with the images you want to predict. MobileNet also expects pixels to be in the range -1 to 1 the function tf.keras.applications.mobilenet_v2.preprocess_input performs that operation. You need too scale your pixels similarly for the images you wish to predict. You can use the function mentioned or you can use the equivalent function shown below. Also if the model was trained on RGB, images make sure the images you want to predict are RGB.

def scale(image):
    return image/127.5-1

CodePudding user response：

Mobilenet V1 (paper) accepts inputs of 224x224x3. Mobilenet V2 additions are mainly in linear bottlenecks between layers and shortcut/skip connections, so I dont think the architecture's input dimensions have been changed (Google AI blog post on MobileNetV2).

(This is based on my personal experience): I am almost certain the resizing is just a scaling of the image that maintains the original aspect ratio and zero-pads it. Alternatively they could directly scale it and change the aspect ratio but this seems unlikely. They definitely aren't using anything like adaptive pooling for resizing.