Home > Software engineering >  Numpy function that returns the minimum dtype to hold the objects in the sequence?
Numpy function that returns the minimum dtype to hold the objects in the sequence?

Time:05-30

As stated, is there a numpy function that can return the minimum dtype required when creating an array? For example, if the input is a list [3., 4.] and dtype is not specified, then np.array will choose numpy.float64 as dtype; if the input is [3, 4] then np.array will choose numpy.int32 as dtype:

>>> x = np.array([3., 4.])
>>> x.dtype.type
numpy.float64

>>> x = np.array([3, 4])
>>> x.dtype.type
numpy.int32

I'd like to create a class such that, if the input list is float, it returns a Numpy array whose dtype is np.float32, not np.float64 as in np.array:

class Tensor:
    
    def __init__(self, data):
        self.data = np.array(data)
    
    def __repr__(self):
        return str(self.data)
    
    @property
    def dtype(self):
        return self.data.dtype

currently if you create a Tensor with a list [3., 4.], then its dtype is np.float64:

>>> Tensor([1.,2.]).dtype.type is np.float64
True

CodePudding user response:

To answer my own question, I do find the desired function when searching the numpy doc. It is called numpy.min_scalar_type which is new in version 1.6.0.

What I want is to create a Tensor such that when input list is float, its default dtype is float32. This is quite handy when testing machine learning algorithms using Numpy (or Cupy in GPU):

class Tensor:
    def __init__(self, data, dtype=None):
        min_dtype = np.min_scalar_type(data).type
        if dtype is None:
            if issubclass(min_dtype, np.floating):
                dtype = np.float32
            elif issubclass(min_dtype, np.integer):
                dtype = np.int32
            elif issubclass(min_dtype, np.complexfloating):
                dtype = np.complex64
        self.data = np.array(data, dtype=dtype)
    
    def __repr__(self):
        return str(self.data)
    
    @property
    def dtype(self):
        return self.data.dtype

Now the default is float32, int32 and complex64:

>>> Tensor(1.).dtype.type
numpy.float32

>>> Tensor(1).dtype.type
numpy.int32

>>> Tensor(1 2j).dtype.type
numpy.complex64

CodePudding user response:

As per my comment above, you can create both arrays and check if values are different and then return a 32-bit version if there's no difference:

def return_min_dtype(it):
    np64 = np.array(it, dtype=np.float64)
    np32 = np.array(it, dtype=np.float32)
    if np64 == np32:
        return np32
    else:
        return np64


arr_1 = [1.5]
arr_2 = [1.5555555555555555555555555555555555555555555555555]

a = return_min_dtype(arr_1)
b = return_min_dtype(arr_2)
print(a.dtype, b.dtype)     # float32 float64

The only concern here is that if arrays are large then it might be difficult to have them both in memory.

  • Related