Home > other >  How use numpy.bincount with a weight vector of non-built-in type?
How use numpy.bincount with a weight vector of non-built-in type?

Time:11-29

I am using numpy.bincount, and I have a vector of indices ind, and a vector of weights coef, trying to run np.bincount(ind, coef). The problem here is that my weight vector is not of type float64, it is a non-built-in class supporting the arithmetic operator __add__.

I wonder how I can do this? Directly run the code np.bincount(ind, coef) gives me an error that

TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

The specific type I am considering is LaruentPolynomailRing from Sagemath.

CodePudding user response:

bincount is compiled code, so we can't (readily) see what it does; we can only deduce things from the behavior.

The basic count:

In [303]: np.bincount(x)
Out[303]: array([1, 2, 3])

But adapting the weight example, to provide an int array of weights:

In [304]: #w = np.array([0.3, 0.5, 0.2, 0.7, 1., -0.6]) # weights
     ...: w = np.array([3,5,2,7,10,-6])
     ...: x = np.array([0, 1, 1, 2, 2, 2])
     ...: np.bincount(x,  weights=w)
Out[304]: array([ 3.,  7., 11.])

This is consistent with your error. The result is float, even when weights are int. Weights have been converted to float.

It might do something like this - but compiled code:

In [306]: res = np.zeros(3)
In [307]: for i,v in zip(x,w):
     ...:     res[i]  = v
     ...: 
In [308]: res
Out[308]: array([ 3.,  7., 11.])

I'm guessing this because it returns a result for each integer value between the x.min and x.max. Written like this it just requires w to have the __add__. But this kind of iteration on object dtype array is slow, even in compiled code - since it has to use to __add__ of each element object. It can't just zip through the byte data-buffer of the w array.

Without the consecutive bin value constraint, a defaultdict is an easy tool for collecting like values.

In [309]: from collections import defaultdict
In [310]: dd = defaultdict(float)
In [311]: for i,v in zip(x,w):
     ...:     dd[i]  = v
     ...: 
In [312]: dd
Out[312]: defaultdict(float, {0: 3.0, 1: 7.0, 2: 11.0})

another way -again where x values are indices in the return array:

In [313]: res = np.zeros(3)
In [315]: np.add.at(res, x, w)
In [316]: res
Out[316]: array([ 3.,  7., 11.])

I think all these will work with the objects with __add__.

  • Related