ONNX model checker fails while ONNX runtime works fine when `tf.function` is used to decorate memebe-CodePudding

When a tensorflow model contains tf.function decorated function with for loop in it, the tf->onnx conversion yields warnings:

WARNING:tensorflow:From /Users/amit/Programs/lammps/kim/kliff/venv/lib/python3.7/site-packages/tf2onnx/tf_loader.py:706: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
Cannot infer shape for model/ex_layer/PartitionedCall/while: model/ex_layer/PartitionedCall/while:3
Cannot infer shape for model/ex_layer/PartitionedCall/Identity: model/ex_layer/PartitionedCall/Identity:0
Cannot infer shape for Func/model/ex_layer/PartitionedCall/output/_3: Func/model/ex_layer/PartitionedCall/output/_3:0
Cannot infer shape for Identity: Identity:0
missing output shape for while/Identity_3:0
missing output shape for while/Identity_3:0
missing output shape for while/Identity_3:0
missing output shape for while/Identity_3:0
...

And as the obtained model is run through onnxruntime it runs fine, but model checker gives the following error

Traceback (most recent call last):
  File "failed_example.py", line 85, in <module>
    onnx.checker.check_model(onnx.load("tmp.onnx"))
  File "venv/lib/python3.7/site-packages/onnx/checker.py", line 106, in check_model
    C.check_model(protobuf_string)
onnx.onnx_cpp2py_export.checker.ValidationError: Field 'shape' of type is required but missing.

Netron does not show any appreciable difference between the model with decorated function and without decorated function. I guess error comes from the fact that the for loop is converted to separate while-loop graph, whose input shape is not defined. But it does work perfectly without tf.function decorator. I am putting a minimal replication code below.

I think it is related to following issues:

Code to replicate:

import tensorflow as tf
import numpy as np
import sys
import onnx
import onnxruntime
import tf2onnx

# =============================================================================
# Layer and its herler functions
# COMMENT IT OUT TO PASS ONNX CHECK
@tf.function(
    input_signature=[
    tf.TensorSpec(shape=[None,None], dtype=tf.int32),
    tf.TensorSpec(shape=[None,None], dtype=tf.float32),
    tf.TensorSpec(shape=None, dtype=tf.float32),
    ])
def extra_function(
    list1,
    list2,
    accum_var
    ):
    some_num = 4
    num_iter = tf.size(list1)//some_num
    for i in range(num_iter):
        xyz_i = list2[0, i * 3 : (i   1) * 3]
        accum_var  = tf.reduce_sum(xyz_i)
    return accum_var

class ExLayer(tf.keras.layers.Layer):
    def __init__(self):
        super().__init__()

    # Doesnt tf.function also create graphs out of called functions?
    # however it does not seem to do that if `call` function is decorated
    # @tf.function(
    #     input_signature=[
    #     tf.TensorSpec(shape=[None,None], dtype=tf.float32),
    #     tf.TensorSpec(shape=[None,None], dtype=tf.int32),
    #     ])
    def call(self, list2,list1):
        accum_var = tf.constant(0.0)
        accum_var = extra_function( list1, list2, accum_var)
        return accum_var
# =============================================================================


# =============================================================================
# Example implementation

layer1 = tf.keras.layers.Input(shape=(1,))
layer2 = tf.keras.layers.Input(shape=(1,), dtype=tf.int32)
EL = ExLayer()(layer1,layer2)
model = tf.keras.models.Model(inputs=[layer1, layer2], outputs=EL)

# Define input data
list2_tf = tf.constant([[0.,0.,0.,1.,1.,1.,2.,2.,2.,3.,3.,3.]],dtype=tf.float32)
list1_tf = tf.constant([[0,1,2,-1,1,0,2,-1,2,0,1,-1]],dtype=tf.int32)
list2_np = np.array([[0.,0.,0.,1.,1.,1.,2.,2.,2.,3.,3.,3.]],dtype=np.float32)
list1_np = np.array([[0,1,2,-1,1,0,2,-1,2,0,1,-1]],dtype=np.int32)

# Save to onnx
model_proto, external_tensor_storage = tf2onnx.convert.from_keras(model,
            input_signature=[
                tf.TensorSpec(shape=[None,None], dtype=tf.float32, name="list2"),
                tf.TensorSpec(shape=[None,None], dtype=tf.int32, name="list1")
                ],
            opset=11,
            output_path="tmp.onnx")


# Load onnx runtime session
ort_session = onnxruntime.InferenceSession("tmp.onnx")
inputs = {"list2":list2_np, "list1":list1_np}

print("===================================================")
print("Original model evaluation:")
print(model([list2_tf,list1_tf]))
print("ORT session evaluation")
print(ort_session.run(None, inputs))
print("===================================================")

# Check with model checker
onnx.checker.check_model(onnx.load("tmp.onnx"))

ONNX version: 1.10.2
Python version: 3.7.7
TF version: 2.7.0

Related github issues I submitted:

CodePudding user response：

The problem is in the way you specified the shape of accumm_var.

In the input signature you have tf.TensorSpec(shape=None, dtype=tf.float32). Reading the code I see that you are passing a scalar tensor. A scalar tensor is a 0-Dimension tensor, so you should use shape=[] instead of shape=None.

I run here without warnings after annotating extra_function with

tf.function(
    input_signature=[
    tf.TensorSpec(shape=[None,None], dtype=tf.int32),
    tf.TensorSpec(shape=[None,None], dtype=tf.float32),
    tf.TensorSpec(shape=[], dtype=tf.float32),
    ])