py_function: Replace convolution operation but keep gradient-CodePudding

I am trying to train a LeNet-5 model in TensorFlow 2 while replacing all Dense's matrix multiplications and Conv2D's convolutions with custom ones written in C. More precisely, I would like to keep those gradients as they are by default, but use my implementation of these operations instead of TensorFlow's default ones. Also, I cannot perform my custom convolutions and matrix multiplications using TensorFlow, I must go through my C code, which is called through CTypes. Is there a way of doing so ?

What I tried so far is to use TensorFlow's @tf.experimental.dispatch_for_api to call a functions that uses tf.py_function, which, in turn, calls my C code. However is seems that in doing so the gradients are lost and the model cannot be trained. Is there any other way of doing it ?

@tf.experimental.dispatch_for_api(tf.matmul,
    {'a': ApproximateTensor},
    {'b': ApproximateTensor},
    {'a': tf.Tensor        , 'b': tf.Tensor        },
    {'a': ApproximateTensor, 'b': tf.Tensor        },
    {'a': tf.Tensor        , 'b': ApproximateTensor},
    {'a': ApproximateTensor, 'b': ApproximateTensor},
)
def custom_matmul(a, b, transpose_a=False, transpose_b=False, adjoint_a=False, adjoint_b=False, a_is_sparse=False, b_is_sparse=False, output_type=None):
    # tf.print('MATMUL')
    if not isinstance(a, ApproximateTensor):
        a = ApproximateTensor(a)
    if not isinstance(b, ApproximateTensor):
        b = ApproximateTensor(b)

    _, out_size = b.shape

    return ApproximateTensor(process.linear(a.values, b.values, tf.zeros(out_size)))

process.linear then does this:

def linear(input, kernel, bias):
    global c_approx

    def compute(input, kernel, bias):
        global c_approx

        # Extract dimensions
        batch_size, in_size = input.shape
        _, out_size = kernel.shape
        if batch_size is None:
            batch_size = 1

        # Create output
        output = np.zeros((batch_size, out_size), dtype=np.float32)

        # Compute
        for b in range(batch_size):
            output[b] = c_approx.custom_matmul(input[b], kernel[i])   bias

        return tf.convert_to_tensor(output)

    output = tf.py_function(compute, [input, kernel, bias], input.dtype)

    # Manually set output dimensions
    batch_size, _ = input.shape
    _, out_size = kernel.shape
    output.set_shape((batch_size, out_size))

    return output

In other words, I want the opposite of what the following code would do:

@tf.custom_gradient
def custom_conv(x):
    def grad_fn(dy):
        return dy

    return tf.nn.conv2d(x), grad_fn

I want to redefine Conv2D while keeping his default gradient.

Thanks in advance

CodePudding user response：

I actually figured it out thanks to this answer: https://stackoverflow.com/a/43952168/9675442

The idea was to do this:

y = tf.matmul(a, b)   tf.stop_gradient(compute(a, b) - tf.matmul(a, b))

I hope this will help others