GPFlow Multiclass classification with vector inputs causes value error on shape mismatch-CodePudding

I am trying to follow the Multiclass classification in GPFlow (using v2.1.3) as described here:

https://gpflow.readthedocs.io/en/master/notebooks/advanced/multiclass_classification.html

The difference with the example is that the X vector is 10-dimensional and the number of classes to predict is 5. But there seems to be error in dimensionality when using the inducing variables. I changed the kernel and use dummy data for reproducability, just looking to get this code to run. I put the dimensions of the variables in case that is the issue. Any calculation of loss causes an error like:

 ValueError: Dimensions must be equal, but are 10 and 5 for '{{node truediv}} = RealDiv[T=DT_DOUBLE](strided_slice_2, truediv/softplus/forward/IdentityN)' with input shapes: [200,10], [5].

It is as if it requires the Y results for the inducing variables, but the example on the gpflow site does not require it or it is confusing the length of the X input with the number of classes to predict.

I tried expanding the dimension of Y as in gpflow classification implementation, but did not help.

Reproducible Code:

import gpflow
from gpflow.utilities import ops, print_summary, set_trainable
from gpflow.config import set_default_float, default_float, set_default_summary_fmt
from gpflow.ci_utils import ci_niter
import random
import numpy as np
import tensorflow as tf

np.random.seed(0)
tf.random.set_seed(123)

num_classes = 5
num_of_data_points = 1000
num_of_functions = num_classes
num_of_independent_vars = 10

data_gp_train = np.random.rand(num_of_data_points, num_of_independent_vars)
data_gp_train_target_hot = np.eye(num_classes)[np.array(random.choices(list(range(num_classes)), k=num_of_data_points))].astype(bool)
data_gp_train_target = np.apply_along_axis(np.argmax, 1, data_gp_train_target_hot)
data_gp_train_target = np.expand_dims(data_gp_train_target, axis=1)


data_gp = ( data_gp_train, data_gp_train_target )

lengthscales = [0.1]*num_classes
variances = [1.0]*num_classes
kernel = gpflow.kernels.Matern32(variance=variances, lengthscales=lengthscales) 

# Robustmax Multiclass Likelihood
invlink = gpflow.likelihoods.RobustMax(num_of_functions)  # Robustmax inverse link function
likelihood = gpflow.likelihoods.MultiClass(num_of_functions, invlink=invlink)  # Multiclass likelihood

inducing_inputs = data_gp_train[::5].copy()  # inducing inputs (20% of obs are inducing)
# inducing_inputs = data_gp_train[:200,:].copy()  # inducing inputs (20% of obs are inducing)
   
m = gpflow.models.SVGP(
    kernel=kernel,
    likelihood=likelihood,
    inducing_variable=inducing_inputs,
    num_latent_gps=num_of_functions,
    whiten=True,
    q_diag=True,
)

set_trainable(m.inducing_variable, False)
print_summary(m)

opt = gpflow.optimizers.Scipy()
opt_logs = opt.minimize(
    m.training_loss_closure(data_gp), m.trainable_variables, options=dict(maxiter=ci_niter(1000))
)
print_summary(m, fmt="notebook")

Dimensions:

data_gp[0].shape
Out[132]: (1000, 10)

data_gp[1].shape
Out[133]: (1000, 5)

inducing_inputs.shape
Out[134]: (200, 10)

The error:

 ValueError: Dimensions must be equal, but are 10 and 5 for '{{node truediv}} = RealDiv[T=DT_DOUBLE](strided_slice_2, truediv/softplus/forward/IdentityN)' with input shapes: [200,10], [5].

CodePudding user response：

When running your example I get a slightly different bug, but the issue is in how you define lengthscales and variances. You write:

lengthscales = [0.1]*num_classes
variances = [1.0]*num_classes
kernel = gpflow.kernels.Matern32(variance=variances, lengthscales=lengthscales)

But the standard kernels require a scalar variance and the lengthscales to be a scalar or to match the number of features, so if you replace that code with the following:

lengthscales = [0.1]*num_of_independent_vars
kernel = gpflow.kernels.Matern32(variance=1.0, lengthscales=lengthscales)

then it all runs fine.

This would give you a shared kernel for each output (class probability) with independent lengthscales per input dimension ("ARD").

If you want different kernels for each output (but, for example, isotropic lengthscale), you can achieve this with a SeparateIndependent multi-output kernel, see the multioutput notebook example.