Home > other >  Finding smallest dtype to safely cast an array to
Finding smallest dtype to safely cast an array to

Time:09-14

Let's say I want to find the smallest data type I can safely cast this array to, to save it as efficiently as possible. (The expected output is int8.)

arr = np.array([-101,125,6], dtype=np.int64)

The most logical solution seems something like

np.min_scalar_type(arr) # dtype('int64')

but that function doesn't work as expected for arrays. It just returns their original data type.

The next thing I tried is this:

np.promote_types(np.min_scalar_type(arr.min()), np.min_scalar_type(arr.max())) # dtype('int16')

but that still doesn't output the smallest possible data type.

What's a good way to achieve this?

CodePudding user response:

Here's a working solution I wrote. It will only work for integers.

def smallest_dtype(arr):
    arr_min = arr.min()
    arr_max = arr.max()
    for dtype_str in ["u1", "i1", "u2", "i2", "u4", "i4", "u8", "i8"]:
        if (arr_min >= np.iinfo(np.dtype(dtype_str)).min) and (arr_max <= np.iinfo(np.dtype(dtype_str)).max):
            return np.dtype(dtype_str)

CodePudding user response:

This is close to your initial idea:

np.result_type(np.min_scalar_type(arr.min()), arr.max())

It will take the signed int8 from arr.min() if arr.max() fits inside of it.

  • Related