Home > OS >  What is the best way to create an object from each element in a python (numpy) array?
What is the best way to create an object from each element in a python (numpy) array?

Time:07-08

I have an array of numbers:

num_arr = np.array([1,2,3,4,5,6,7])

I need to transform each number into an object:

class MyObj:
    def __init__(self, x):
        self.val = x

What would be the best way to do that? Is there a way to do it without using loops?

CodePudding user response:

You can use map but the performance enhancement is not significant.

import numpy as np

class MyObj:
    def __init__(self, x):
        self.val = x

n = 100000
num_arr = np.arange(n)

%timeit -n 10 -r 7 np.array([MyObj(i) for i in num_arr])
> 167 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n 10 -r 7 np.array(list(map(MyObj, num_arr)))
> 163 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The method of @Faulheit is faster than these two methods. (Therefore, I would recommend you to accept his/her answer if you are satisfied with this result.)

vfunc = np.vectorize(MyObj)
%timeit -n 10 -r 7 vfunc(num_arr)
> 34.4 ms ± 813 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Additional tests are done for small lists as @hpaulj suggested.

n = 5

%timeit -n 10 -r 7 np.array([MyObj(i) for i in num_arr])
> 11.2 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n 10 -r 7 np.array(list(map(MyObj, num_arr)))
> 9.91 µs ± 837 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n 10 -r 7 vfunc(num_arr)
> 26.8 µs ± 9.65 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
n = 100

%timeit -n 10 -r 7 np.array([MyObj(i) for i in num_arr])
> 176 µs ± 42 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n 10 -r 7 np.array(list(map(MyObj, num_arr)))
> 160 µs ± 8.36 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n 10 -r 7 vfunc(num_arr)
> 53.7 µs ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

CodePudding user response:

To map over a numpy array, there is np.vectorize

class MyObj:
    def __init__(self, x):
        self.val = x

num_arr = np.array([1,2,3,4,5,6,7])
vfunc = np.vectorize(MyObj)
result = vfunc(num_arr)

Edit:

vectorize is not for performance but for convenience

https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

CodePudding user response:

Someone suggested using np.nditer. Here's a call that works. Note that the time is worse. It's also a lot more complex to use.

In [153]: %%timeit
     ...: it = np.nditer((num_arr,None), flags=('refs_ok',), op_flags=(('readonly',), ('writeonly','allocate')), op_dtypes=(int,object))
     ...: with it:
     ...:     for (a,b) in it:
     ...:        b[...] = MyObj(a.item())
     ...:     res = it.operands[1]
376 ms ± 9.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [154]: timeit vfunc(num_arr)
39.9 ms ± 974 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [155]: timeit np.array([MyObj(i) for i in num_arr.tolist()])
256 ms ± 4.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • Related