Is there a way to optimize the calculation of Bernoulli Log-Likelihoods for many multivariate sample-CodePudding

I currently have two Torch Tensors, p and x, which both have the shape of (batch_size, input_size).

I would like to calculate the Bernoulli log likelihoods for the given data, and return a tensor of size (batch_size)

Here's an example of what I'd like to do: I have the formula for log likelihoods of Bernoulli Random variables:

\sum_i^d x_{i} ln(p_i) (1-x_i) ln (1-p_i)

Say I have p Tensor: [[0.6 0.4 0], [0.33 0.34 0.33]] And say I have the x tensor for the binary inputs based on those probabilities:

[[1 1 0], [0 1 1]]

And I want to calculate the log likelihood for every sample, which would result in:

[[ln(0.6) ln(0.4)], [ln(0.67) ln(0.34) ln(0.33)]]

Would it be possible to do this computation without the use of for loops? I know I could use torch.sum(axis=1) to do the final summation between the logs, but is it possible to do the Bernoulli log-likelihood computation without the use of for loops? or use at most 1 for loop? I am trying to vectorize this operation as much as possible. I could've sworn we could use LaTeX for equations before, did something change or is it another website?

CodePudding user response：

Though not a good practice, you can directly use the formula on the tensors as follows (works because these are element wise operations):

import torch
p = torch.tensor([
    [0.6, 0.4, 0],
    [0.33, 0.34, 0.33]
])

x = torch.tensor([
    [1., 1, 0],
    [0, 1, 1]
])

eps = 1e-8
bll1 = (x * torch.log(p eps)   (1-x) * torch.log(1-p eps)).sum(axis=1)
print(bll1)
#tensor([-1.4271162748, -2.5879497528])

Note that to avoid log(0) error, I have introduced a very small constant eps inside it.

A better way to do this is to use BCELoss inside nn module in pytorch.

import torch.nn as nn
bce = nn.BCELoss(reduction='none')
bll2 = -bce(p, x).sum(axis=1)
print(bll2)
#tensor([-1.4271162748, -2.5879497528])

Since pytorch computes the BCE as a loss, it prepends your formula with a negative sign. The attribute reduction='none' says that I do not want the computed losses to be reduced (averaged/summed) across the batch in any way. This is advisable to use since we do not need to manually take care of numerical stability and error handling (such as adding eps above.)

You can indeed verify that the two solutions actually return the same tensor (upto a tolerance):

torch.allclose(bll1, bll2)
# True

or the tensors (without summing each row):

torch.allclose((x * torch.log(p eps)   (1-x) * torch.log(1-p eps)), -bce(p, x))
# True

Feel free to ask for further clarifications.