Cython caching with fused type-CodePudding

I am trying to add fused type to scipy.stats.qmc._sobol.pyx.

The issue is that we are caching two matrices so that next runs of the functions would not require to load the matrices. We declare them as follow and use some global variable in a function to fill these or not.

cdef cnp.uint64_t poly[MAXDIM]
cdef cnp.uint64_t vinit[MAXDIM][MAXDEG]

If I try to use fused type instead of cnp.uint64_t, I have a compilation error. Indeed, Cython cannot decide on the type here.

The solution I thought about is to declare twice the matrices. One for cnp.uint32_t and one for cnp.uint64_t, then I can detect if I need the first or second set in each functions. But I am afraid it would increase the memory footprint. I thought about freeing one of the array but then if the user calls concurrently the matrices and ask for 32 and 64 bits then it might break.

cdef cnp.uint32_t poly_32[MAXDIM]
cdef cnp.uint32_t vinit_32[MAXDIM][MAXDEG]

cdef cnp.uint64_t poly_64[MAXDIM]
cdef cnp.uint64_t vinit_64[MAXDIM][MAXDEG]

Is there an alternative way to cache the matrices and use fused type? Matrices needs to either be cnp.uint32_t or cnp.uint64_t. Not that someone on a 64 bits architecture can ask to use the 32 bits of the functions, so I cannot really restrict 64 bits on 64 bits architecture.

Here is some more or less complete code to explain the whole logic in Cython and Python:

cdef cnp.uint64_t poly[MAXDIM]
cdef cnp.uint64_t vinit[MAXDIM][MAXDEG]

cdef bint is_initialized = False

def _initialize_direction_numbers():
    global is_initialized
    if not is_initialized:
        for i in range(...):
            poly[i] = ...
            vinit[i] = ...
        is_initialized = True

def _initialize_v(...):
    # use the cached values
    for i in range(...):
        ... = poly[i]
        ... = vinit[i]

In Python I have

_initialize_direction_numbers()
_initialize_v(...)

Calling these 2 functions again would not load again the matrices because is_initialized = True.

CodePudding user response：

I wouldn't do this with C arrays (i.e. cdef cnp.uint64_t poly[MAXDIM]). They have the disadvantages that:

they use the memory whether or not they're actually initialized
they're fairly likely to generate lack stack-allocated temporaries, which can cause errors (although the arrays themselves will not be stack allocated).

Instead I would probably use a dict of Numpy arrays. This doesn't actually involve using fused types.

_poly_dict = {}
_vinit_dict = {}

def get_poly(dtype):
    poly = _poly_dict.get(dtype)
    if not poly:
       _poly_dict[dtype] = np.empty(..., dtype=dtype)
       # ... initialize it
    return poly

# etc.

What you can then do is create memoryviews of these arrays (possibly within a fused function). Memoryviews are very quick to create since they only access existing memory. Something like

cdef fused int32or64:
   cnp.uint32_t
   cnp.uint64_t

def do_calculation(int32or64 user_value):
   # slightly awkward conversion from ctype to Numpy dtype 
   #  - if you have to do this often the use a helper function
   cdef int32or64[:] poly = get_poly(np.int32 if int32or64 is cnp.uint32_t else np.int64)
   # your calculation goes here...

As an aside, if you wanted to use a memoryview of a fused type in get_poly (for example, to initialize the array), it's often useful to add a dummy argument:

def get_poly(dtype, int32or64 dummy):
   ...

That lets you generate it as a fused function (so avoiding duplicating the code) even if it doesn't have a natural "input".