build numpy d-dim array from iterator of (d-1)-dim array-CodePudding

I have a use case, and I simplify it to following question:

import numpy as np

def get_matrix(i): # get a matrix N * M
    return (
        (i, i   1, i   1.2),
        (i   1, i / 2, i * 3.2),
        (i / 3, i * 2, i / 4),
        (i / 5, i * 2.1, i   2.2),
    )

K = 10000
# build a n-d array K * N * M
arr = np.array(
    tuple(get_matrix(i) for i in range(K)), 
    np.float32,
)

However, when I want to get K*N*M numpy array, I need to create a temporary tuple with shape K*N*M. Only when numpy array has been built, the tuple can be garbage collected. Therefore above construction has extra space O(K*N*M).

If I can create the numpy array from iterator (get_matrix(i) for i in range(K)), then every matrix N*M can be garbage collected, when it has been used. Therefore the extra space is O(N*M).

I found there is a method numpy.fromiter(), but I don't know how to write the dtype, since there is a similar example in the last.

import numpy as np

K = 10000
# build a n-d array K * N * M
arr = np.fromiter(
    (get_matrix(i) for i in range(K)), 
    dtype=np.float32, # there is error
)

CodePudding user response：

Ah, so this is a new feature for np.fromiter. Just going by the example in the docs, the following worked:

K = 10000
N = 4
M = 3

# build a n-d array K * N * M
arr = np.fromiter(
    (get_matrix(i) for i in range(K)), 
    dtype=np.dtype((np.float32, (N, M))),
    count=K
)

Note, I used the count argument for good measure, but it works without it.