Home > database >  How to do columnwise operations with Numpy structured arrays?
How to do columnwise operations with Numpy structured arrays?

Time:12-23

This shows the problem nicely:

import numpy as np

a_type = np.dtype([("x", int), ("y", float)])
a_list = []

for i in range(0, 8, 2):
    entry = np.zeros((1,), dtype=a_type)
    entry["x"][0] = i
    entry["y"][0] = i   1.0
    a_list.append(entry)
a_array = np.array(a_list, dtype=a_type)
a_array_flat = a_array.reshape(-1)
print(a_array_flat["x"])
print(np.sum(a_array_flat["x"]))

and this produces the trackback and output:

[0 2 4 6]
Traceback (most recent call last):
  File "/home/andreas/src/masiri/booking_algorythm/demo_structured_aarray_flatten.py", line 14, in <module>
    print(np.sum(a_array_flat["x"]))
  File "<__array_function__ internals>", line 180, in sum
  File "/home/andreas/src/masiri/venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2298, in sum
    return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
  File "/home/andreas/src/masiri/venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types (dtype({'names': ['x'], 'formats': ['<i8'], 'offsets': [0], 'itemsize': 16}), dtype({'names': ['x'], 'formats': ['<i8'], 'offsets': [0], 'itemsize': 16})) -> None

I chose this data structure because I must do many column-wise operations fast and have more esoteric types like timedelta64 and datetime64, too. I am sure basic Numpy operations work, and I overlook something obvious. Please help me.

CodePudding user response:

In an ipython session, your code runs fine:

In [2]: a_type = np.dtype([("x", int), ("y", float)])
   ...: a_list = []
   ...: 
   ...: for i in range(0, 8, 2):
   ...:     entry = np.zeros((1,), dtype=a_type)
   ...:     entry["x"][0] = i
   ...:     entry["y"][0] = i   1.0
   ...:     a_list.append(entry)
   ...: a_array = np.array(a_list, dtype=a_type)
   ...: a_array_flat = a_array.reshape(-1)

In [3]: a_list
Out[3]: 
[array([(0, 1.)], dtype=[('x', '<i4'), ('y', '<f8')]),
 array([(2, 3.)], dtype=[('x', '<i4'), ('y', '<f8')]),
 array([(4, 5.)], dtype=[('x', '<i4'), ('y', '<f8')]),
 array([(6, 7.)], dtype=[('x', '<i4'), ('y', '<f8')])]

In [4]: a_array
Out[4]: 
array([[(0, 1.)],
       [(2, 3.)],
       [(4, 5.)],
       [(6, 7.)]], dtype=[('x', '<i4'), ('y', '<f8')])

In [5]: a_array_flat
Out[5]: 
array([(0, 1.), (2, 3.), (4, 5.), (6, 7.)],
      dtype=[('x', '<i4'), ('y', '<f8')])

In [6]: a_array_flat['x']
Out[6]: array([0, 2, 4, 6])

In [7]: np.sum(a_array_flat["x"])
Out[7]: 12

The error message almost looks like you are indexing with field list:

In [8]: np.sum(a_array_flat[["x"]])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 np.sum(a_array_flat[["x"]])

File <__array_function__ internals>:5, in sum(*args, **kwargs)

...
TypeError: cannot perform reduce with flexible type

In [9]: a_array_flat[["x"]]
Out[9]: 
array([(0,), (2,), (4,), (6,)],
      dtype={'names':['x'], 'formats':['<i4'], 'offsets':[0], 'itemsize':12})

What numpy version are you using? There was a period where numpy versions flipped-flopped on how they handled views of the array.

Doing the sum on the unflattened array:

In [11]: a_array["x"]
Out[11]: 
array([[0],
       [2],
       [4],
       [6]])

In [12]: a_array["x"].sum()
Out[12]: 12

Another way of constructing this array:

In [15]: import numpy.lib.recfunctions as rf
In [16]: arr = np.arange(8).reshape(4,2);arr
Out[16]: 
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

In [17]: arr1 = rf.unstructured_to_structured(arr, dtype=a_type)    
In [18]: arr1
Out[18]: 
array([(0, 1.), (2, 3.), (4, 5.), (6, 7.)],
      dtype=[('x', '<i4'), ('y', '<f8')])

In [19]: arr1['x']
Out[19]: array([0, 2, 4, 6])

or:

In [20]: arr2 = np.zeros(4, a_type)
In [21]: arr2['x']=arr[:,0]; arr2['y']=arr[:,1]
In [22]: arr2
Out[22]: 
array([(0, 1.), (2, 3.), (4, 5.), (6, 7.)],
      dtype=[('x', '<i4'), ('y', '<f8')])

edit

I get your error message with the python sum (as opposed to np.sum, which I showed above).

In [26]: sum(a_array[['x']])
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
Input In [26], in <cell line: 1>()
----> 1 sum(a_array[['x']])

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('int32'), dtype({'names':['x'], 'formats':['<i4'], 'offsets':[0], 'itemsize':12})) -> None
  • Related