As stated, is there a numpy function that can return the minimum dtype required when creating an array? For example, if the input is a list [3., 4.]
and dtype is not specified, then np.array
will choose numpy.float64
as dtype; if the input is [3, 4]
then np.array
will choose numpy.int32
as dtype:
>>> x = np.array([3., 4.])
>>> x.dtype.type
numpy.float64
>>> x = np.array([3, 4])
>>> x.dtype.type
numpy.int32
I'd like to create a class such that, if the input list is float, it returns a Numpy array whose dtype is np.float32
, not np.float64
as in np.array
:
class Tensor:
def __init__(self, data):
self.data = np.array(data)
def __repr__(self):
return str(self.data)
@property
def dtype(self):
return self.data.dtype
currently if you create a Tensor
with a list [3., 4.]
, then its dtype is np.float64
:
>>> Tensor([1.,2.]).dtype.type is np.float64
True
CodePudding user response:
To answer my own question, I do find the desired function when searching the numpy doc. It is called numpy.min_scalar_type which is new in version 1.6.0.
What I want is to create a Tensor
such that when input list is float, its default dtype is float32. This is quite handy when testing machine learning algorithms using Numpy (or Cupy in GPU):
class Tensor:
def __init__(self, data, dtype=None):
min_dtype = np.min_scalar_type(data).type
if dtype is None:
if issubclass(min_dtype, np.floating):
dtype = np.float32
elif issubclass(min_dtype, np.integer):
dtype = np.int32
elif issubclass(min_dtype, np.complexfloating):
dtype = np.complex64
self.data = np.array(data, dtype=dtype)
def __repr__(self):
return str(self.data)
@property
def dtype(self):
return self.data.dtype
Now the default is float32, int32 and complex64:
>>> Tensor(1.).dtype.type
numpy.float32
>>> Tensor(1).dtype.type
numpy.int32
>>> Tensor(1 2j).dtype.type
numpy.complex64
CodePudding user response:
As per my comment above, you can create both arrays and check if values are different and then return a 32-bit version if there's no difference:
def return_min_dtype(it):
np64 = np.array(it, dtype=np.float64)
np32 = np.array(it, dtype=np.float32)
if np64 == np32:
return np32
else:
return np64
arr_1 = [1.5]
arr_2 = [1.5555555555555555555555555555555555555555555555555]
a = return_min_dtype(arr_1)
b = return_min_dtype(arr_2)
print(a.dtype, b.dtype) # float32 float64
The only concern here is that if arrays are large then it might be difficult to have them both in memory.