How to copy dtype when doing numpy array assignment or when appending to a numpy array-CodePudding

I'm pretty illiterate in using Python/numpy.

I have the following piece of code:

data = np.array([])

for i in range(10):
    data = np.append(data, GetData())

return data

GetData() returns a numpy array with a custom dtype. However when executing the above piece of code, the numbers convert to float64 which I suspect is the culprit for other issues I'm having. How can I copy/append the output of the functions while preserving the dtype as well?

CodePudding user response：

Given the comments stating that you will only know the type of data once you run GetData(), and that multiple types are expected, you could do it like so:

# [...]

dataByType = {} # dictionnary to store the dtypes encountered and the arrays with given dtype

for i in range(10):
    newData = GetData()
    if newData.dtype not in dataByType:
        # If the dtype has not been encountered yet,
        # create an empty array with that dtype and store it in the dict
        dataByType[newData.dtype] = np.array([], dtype=newData.dtype)
    # Append the new data to the corresponding array in dict, depending on dtype
    dataByType[newData.dtype] = np.append(dataByType[newData.dtype], newData)

CodePudding user response：

Your use of [] and append indicates that your are naively copying that common list idiom:

alist = []
for x in another_list:
   alist.append(x)

Your data is not a clone of the [] list:

In [220]: np.array([])
Out[220]: array([], dtype=float64)

It's an array with shape (0,) and dtype float.

np.append is not an list append clone. I stress that, because too many new users make that mistake, and the result is many different errors. It is really just a cover for np.concatenate, one that takes 2 arguments instead of a list of arguments. As the docs stress it returns a new array, and when used iteratively, that means a lot of copying.

It is best to collect your arrays in a list, and give it to concatenate. List append is in-place, and better when done iteratively. If you give concatenate a list of arrays, the resulting dtype will be the common one (or whatever promoting requires). (new versions do let you specify dtype when calling concatenate.)

Keep the numpy documentation at hand (python too if necessary), and look up functions. Pay attention to how they are called, including the keyword parameters). And practice with small examples. I keep an interactive python session at hand, even when writing answers.

When working with arrays, pay close attention to shape and dtype. Don't make assumptions.

concatenating 2 int arrays:

In [238]: np.concatenate((np.array([1,2]),np.array([4,3])))
Out[238]: array([1, 2, 4, 3])

making one a float array (just by adding a decimal point to one number):

In [239]: np.concatenate((np.array([1,2]),np.array([4,3.])))
Out[239]: array([1., 2., 4., 3.])

It won't let me change the result to int:

In [240]: np.concatenate((np.array([1,2]),np.array([4,3.])), dtype=int)
Traceback (most recent call last):
  File "<ipython-input-240-91b4e3fec07a>", line 1, in <module>
    np.concatenate((np.array([1,2]),np.array([4,3.])), dtype=int)
  File "<__array_function__ internals>", line 180, in concatenate
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'same_kind'

If an element is a string, the result is also a string dtype:

In [241]: np.concatenate((np.array([1,2]),np.array(['4',3.])))
Out[241]: array(['1', '2', '4', '3.0'], dtype='<U32')

Sometimes it is necessary to adjust dtypes after a calculation:

In [243]: np.concatenate((np.array([1,2]),np.array(['4',3.]))).astype(float)
Out[243]: array([1., 2., 4., 3.])
In [244]: np.concatenate((np.array([1,2]),np.array(['4',3.]))).astype(float).as
     ...: type(int)
Out[244]: array([1, 2, 4, 3])