I have a use case, and I simplify it to following question:
import numpy as np
def get_matrix(i): # get a matrix N * M
return (
(i, i 1, i 1.2),
(i 1, i / 2, i * 3.2),
(i / 3, i * 2, i / 4),
(i / 5, i * 2.1, i 2.2),
)
K = 10000
# build a n-d array K * N * M
arr = np.array(
tuple(get_matrix(i) for i in range(K)),
np.float32,
)
However, when I want to get K*N*M
numpy array, I need to create a temporary tuple with shape K*N*M
. Only when numpy array has been built, the tuple can be garbage collected. Therefore above construction has extra space O(K*N*M)
.
If I can create the numpy array from iterator (get_matrix(i) for i in range(K))
, then every matrix N*M
can be garbage collected, when it has been used. Therefore the extra space is O(N*M)
.
I found there is a method numpy.fromiter()
, but I don't know how to write the dtype, since there is a similar example in the last.
import numpy as np
K = 10000
# build a n-d array K * N * M
arr = np.fromiter(
(get_matrix(i) for i in range(K)),
dtype=np.float32, # there is error
)
CodePudding user response:
Ah, so this is a new feature for np.fromiter
. Just going by the example in the docs, the following worked:
K = 10000
N = 4
M = 3
# build a n-d array K * N * M
arr = np.fromiter(
(get_matrix(i) for i in range(K)),
dtype=np.dtype((np.float32, (N, M))),
count=K
)
Note, I used the count
argument for good measure, but it works without it.