tf.GradientTape with outer product returns None-CodePudding

I am trying to postprocess my model's prediction before computing the loss function, since my true data (y_train) is the outer product of the NN output. I have followed these steps:

I know that the operation I am trying to do usign numpy is:

nX = 201
nT = 101
nNNout = nX nT
nBatch = 32

NNout = np.random.rand(nBatch, nNNout)

f = NNout[:, :nX]
g = NNout[:,nX:]

test = np.empty([nBatch, nX*nT])

for i in range(nBatch):
    test[i,:] = np.outer(f[i,:], g[i,:]).flatten('F')

where the NN output contains f and g. What I actually need is the vectorised version of the outer product of f and g for each batch instance.

I have translated this in a compact tensorflow operation as:

test2 = tf.Variable([tf.reshape(tf.transpose(tf.tensordot(f[i,:],g[i,:], axes=0)),[nX*nT]) for i in range(nBatch)])

which I have checked that is correct and that outputs the same values than in step 1.

Then, I am just trying to add this operation after the prediciton of my model as:

    n_epochs = 20
    batch_size = 32
    n_steps = len(x_train) // batch_size
    optimizer = keras.optimizers.Nadam(learning_rate=0.01)
    loss_fn = keras.losses.mean_squared_error
    mean_loss = keras.metrics.Mean()
    metrics = [keras.metrics.MeanAbsoluteError()]

    # ------------ Training ------------
    for epoch in range(1, n_epochs   1):
        print("Epoch {}/{}".format(epoch, n_epochs))
        for step in range(1, n_steps   1):
            X_batch, y_batch = random_batch(x_train, np.array(y_train))
            with tf.GradientTape() as tape:
                y_pred = model(X_batch, training=True)
                u_pred = tf.Variable([tf.reshape(tf.transpose(tf.tensordot(y_pred[i, :nX], y_pred[i, nX:], axes=0)), [nX * nT]) for i in
                             range(batch_size)])
                main_loss = tf.reduce_mean(loss_fn(y_batch, u_pred))
                loss = tf.add_n([main_loss]   model.losses)
            gradients = tape.gradient(loss, model.trainable_variables)

My main issue is that gradients become a list of Nones when I add the operation. If I simply compute the loss function with my model's prediction (y_pred) the code is able to compute the gradients.

Could you please help me find the error I am making here?

CodePudding user response：

You are creating a new (trainable) variable in u_pred, thus breaking any dependency of u_pred on y_pred. The reason why value matches is because you initialise your new variable with the prediction, but it has no functional dependency on each other anymore, there are no gradients flowing.

I am guessing that you did that because you needed a tf.Tensor and not a list, and you ended up with type errors. You probably want to use something among the lines of tf.concatenate and not tf.Variable for that.