I have an array of numbers:
num_arr = np.array([1,2,3,4,5,6,7])
I need to transform each number into an object:
class MyObj:
def __init__(self, x):
self.val = x
What would be the best way to do that? Is there a way to do it without using loops?
CodePudding user response:
You can use map
but the performance enhancement is not significant.
import numpy as np
class MyObj:
def __init__(self, x):
self.val = x
n = 100000
num_arr = np.arange(n)
%timeit -n 10 -r 7 np.array([MyObj(i) for i in num_arr])
> 167 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 -r 7 np.array(list(map(MyObj, num_arr)))
> 163 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The method of @Faulheit is faster than these two methods. (Therefore, I would recommend you to accept his/her answer if you are satisfied with this result.)
vfunc = np.vectorize(MyObj)
%timeit -n 10 -r 7 vfunc(num_arr)
> 34.4 ms ± 813 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Additional tests are done for small lists as @hpaulj suggested.
n = 5
%timeit -n 10 -r 7 np.array([MyObj(i) for i in num_arr])
> 11.2 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 -r 7 np.array(list(map(MyObj, num_arr)))
> 9.91 µs ± 837 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 -r 7 vfunc(num_arr)
> 26.8 µs ± 9.65 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
n = 100
%timeit -n 10 -r 7 np.array([MyObj(i) for i in num_arr])
> 176 µs ± 42 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 -r 7 np.array(list(map(MyObj, num_arr)))
> 160 µs ± 8.36 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 -r 7 vfunc(num_arr)
> 53.7 µs ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
CodePudding user response:
To map over a numpy array, there is np.vectorize
class MyObj:
def __init__(self, x):
self.val = x
num_arr = np.array([1,2,3,4,5,6,7])
vfunc = np.vectorize(MyObj)
result = vfunc(num_arr)
Edit:
vectorize is not for performance but for convenience
https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
CodePudding user response:
Someone suggested using np.nditer
. Here's a call that works. Note that the time is worse. It's also a lot more complex to use.
In [153]: %%timeit
...: it = np.nditer((num_arr,None), flags=('refs_ok',), op_flags=(('readonly',), ('writeonly','allocate')), op_dtypes=(int,object))
...: with it:
...: for (a,b) in it:
...: b[...] = MyObj(a.item())
...: res = it.operands[1]
376 ms ± 9.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [154]: timeit vfunc(num_arr)
39.9 ms ± 974 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [155]: timeit np.array([MyObj(i) for i in num_arr.tolist()])
256 ms ± 4.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)