Invalid argument error with TensorFlow 2 with self-defined loss function (Student t distribution)-CodePudding

This question is a follow-up to the following question that has already been answered, which I would like to formally ask here as a new question. The original question is located here:

Invalid argument error with TensorFlow 2 with self-defined loss function, although everything seems to be correct

As mentioned, I am currently training TensorFlow models to predict parameters of different distributions. For this purpose, I create appropriate layers and modify the loss functions.

Unfortunately, when I use a multivariate t-distribution (tfp.distributions.MultivariateStudentTLinearOperator), the following error results:

InvalidArgumentError:  Input matrix is not invertible.
     [[node negative_t_loss_2/negative_t_loss_2_MultivariateStudentTLinearOperator/log_prob/LinearOperatorLowerTriangular/solve/triangular_solve/MatrixTriangularSolve (defined at d:\20_programming\python\virtualenvs\tensorflow-gpu-2\lib\site-packages\tensorflow_probability\python\distributions\multivariate_student_t.py:265) ]] [Op:__inference_train_function_1471]

Function call stack:
train_function

This time, the procedure for defining the loss function is as follows:

def negative_t_loss_2(y_true, y_pred):
    # Separate the parameters
    n, mu1, mu2, sigma11, sigma12, sigma22 = tf.unstack(y_pred, num=6, axis=-1)
    mu = tf.transpose([mu1, mu2], perm=[1, 0])
    sigma = tf.linalg.LinearOperatorLowerTriangular(tf.transpose([[sigma11, sigma12], [sigma12, sigma22]], perm=[2, 0, 1]))
    dist = tfp.distributions.MultivariateStudentTLinearOperator(df=n, loc=mu, scale=sigma)
    nll = tf.reduce_mean(-dist.log_prob(y_true))
    return nll

I have copied the complete (somewhat more extensive) code and the required data to

https://drive.google.com/drive/folders/1IIAtKDB8paWV0aFVFALDUAiZTCqa5fAN?usp=sharing

(notebook "normdist_2D_not_working_t.ipynb").

The operating system I use is Windows 10, the Python version is 3.6. All libraries listed in the sample code are the latest, including tensorflow-gpu.

I would be very grateful if the problem could be solved. The topic is particularly relevant for the financial sector, since such distributions play a major role here, especially in risk management.

CodePudding user response：

The scale matrix needs to be lower triangular when calling LinearOperatorLowerTriangular, to convert the tensor to a linear operator, just replace

sigma = tf.linalg.LinearOperatorLowerTriangular(tf.transpose([[sigma11, sigma12], [sigma12, sigma22]], perm=[2, 0, 1]))

by:

sigma = tf.linalg.LinearOperatorLowerTriangular(tf.transpose([[sigma11, tf.zeros_like(sigma12)], [sigma12, sigma22]], perm=[2, 0, 1]))

Also the parameter n of the Student-t is positive, so you should add n = tf.keras.activations.softplus(n) in negative_t_layer_2 function

Then, it should work.