Efficient computation of entropy-like formula (sum(xlogx)) in Python-CodePudding

I'm looking for an efficient way to compute the entropy of vectors, without normalizing them and while ignoring any non-positive value.

Since the vectors aren't probability vectors, and shouldn't be normalized, I can't use scipy's entropy function.

So far I couldn't find a single numpy or scipy function to obtain this, and as a result my alternatives involve breaking the computation into 2 steps, which involve intermediate arrays and slow down the run time. If anyone can think of a single function for this computation it will be interseting.

Below is a timeit script for measuring several alternatives at I tried. I'm using a pre-allocated array to avoid repeated allocations and deallocations during run-time. It's possible to select which alternative to run by setting the value of func_code. I included the nansum offered by one of the answers. The measurements on My MacBook Pro 2019 are: matmul: 16.720187613 xlogy: 17.296380516 nansum: 20.059866123000003

import timeit

import numpy as np
from scipy import special


def matmul(arg):
    a, log_a = arg
    log_a.fill(0)
    np.log2(a, where=a > 0, out=log_a)
    return (a[:, None, :] @ log_a[..., None]).ravel()


def xlogy(arg):
    a, log_a = arg
    a[a < 0] = 0
    return np.sum(special.xlogy(a, a), axis=1) * (1/np.log(2))


def nansum(arg):
    a, log_a = arg
    return np.nansum(a * np.log2(a, out=log_a), axis=1)


def setup():
    a = np.random.rand(20, 1000) - 0.1
    log = np.empty_like(a)
    return a, log


setup_code = """
from __main__ import matmul, xlogy, nansum, setup
data = setup() 
"""
func_code = "matmul(data)"

print(timeit.timeit(func_code, setup=setup_code, number=100000))

CodePudding user response：

Having np.log2 of negative numbers (or zero) just gives a runtime warning and sets those values to np.nan, which is probably the best way to deal with them. If you don't want them to pollute your sum, just use

np.nansum(v_i*np.log2(v_i))

CodePudding user response：

On my machine the computation of the logarithms takes about 80% of the time of matmul so it is definitively the bottleneck an optimizing other functions will result in a negligible speed up.

The bad news is that the default implementation np.log is not yet optimized on most platforms. Indeed, it is not vectorized by default, except on recent x86 Intel processors supporting AVX-512 (ie. basically Skylake processors on servers and IceLake processors on PCs, not recent AlderLake though). This means the computation could be significantly faster once vectorized. AFAIK, the close-source SVML library do support AVX/AVX2 and could speed up it (on x86-64 processors only). SMVL is supported by Numexpr and Numba which can be faster because of that assuming you have access to the non-free SVML which is a part of Intel tools often available on HPC machines (eg. like MKL, OneAPI, etc.).

If you do not have access to the SVML there are two possible remaining options:

Implement your own optimized SIMD log2 function which is possible but hard since it require a good understanding of the hardware SIMD units and certainly require to write a C or Cython code. This solutions consists in computing the log2 function as a n-degree polynomial approximation (it can be exact to 1 ULP with a big n though one generally do not need that). Naive approximations (eg. n=1) are much simple to implement but often too inaccurate for a scientific use).
Implement a multi-threaded log computation typically using Numba/Cython. This is a desperate solution as multithreading can slow things down if the input data is not large enough.

Here is an example of multi-threaded Numba solution:

import numba as nb

@nb.njit('(UniTuple(f8[:,::1],2),)', parallel=True)
def matmul(arg):
    a, log_a = arg
    result = np.empty(a.shape[0])
    for i in nb.prange(a.shape[0]):
        s = 0.0
        for j in range(a.shape[1]):
            if a[i, j] > 0:
                s  = a[i, j] * np.log2(a[i, j])
        result[i] = s
    return result

This is about 4.3 times faster on my 6-core PC (200 us VS 46.4 us). However, you should be careful if you run this on a server with many cores on such small dataset as it can actually be slower on some platforms.