Home > OS >  What is numpy.ndarray.flags.contiguous about?
What is numpy.ndarray.flags.contiguous about?

Time:05-19

While experimenting with Numpy, I found that the contiguous value provided by numpy.info may differ from numpy.ndarray.data.contiguous (see the code and screenshot below).

import numpy as np

x = np.arange(9).reshape(3,3)[:,(0,1)]

np.info(x)

print(f'''
{x.data.contiguous = }
{x.flags.contiguous = }

{x.data.c_contiguous = }
{x.flags.c_contiguous = }

{x.data.f_contiguous = }
{x.flags.f_contiguous = }
''')

According to experimental output

CodePudding user response:

x = np.arange(9).reshape(3,3)[:,(0,1)]

np.arange(9) produces a 1d array; reshape(3,3) reshapes it to 2d. It's a view of the original arange. Without an order parameter, reshape sticks with the default c-order.

The [0,[0,1]] is advanced indexing, making a copy. Indexing with [0,:2] would select the same values, but make a view.

The info strides is (8,24). The strides for x after the reshape should be (24,8), stepping by 8 bytes for the last dimension, 3*8 for first. But the advanced indexing flips things around - that's a detail of indexing that we usually ignore (or are unaware of).

2d array with smaller first strides is F-order.

I won't try to decipher all the data/flats contiguous prints, but the basic layout is obvious to me from the shape and strides. I think the strides has priority, and all the 'contiguous' displays are derivative, interpretations, so to speak, of the strides.

With 3d (or higher) arrays, the contiguous alternatives can break down. It would be possible to make an array with strides like (48,8,24), where the middle dimension steps the fastest. That's neither c or f congtiguous.

I might add that unless you are doing something like

np.arange(9).reshape(3,3, order='F')

the type of contiguity is usually not something we worry about. Some functions (esp. compiled ones) require a certain contiguity. And some operations are faster (or slower) depending on which dimension is 'inner-most'. But for ordinary numpy use I don't pay much attention to the flags. I used numpy for years before realizing that your example indexing flipped the order.

You could have just displayed x.flags. I'm not sure what displaying x.data does for you.

CodePudding user response:

Let's see how numpy.info works. From the source code we can see the subroutine for processing ndarray:

def info(object=None, maxwidth=76, output=None, toplevel='numpy'):
    ...
    elif isinstance(object, ndarray):
        _info(object, output=output)
    ...

def _info(obj, output=None):
    """Provide information about ndarray obj"""
    bp = lambda x: x
    ...
    print("contiguous: ", bp(obj.flags.contiguous), file=output)
    print("fortran: ", obj.flags.fortran, file=output)
    ...

It returns flags.contiguous as the array's continuity parameter. This one isn't specified in flags description. But we can find it in flagsobject.c:

// ...
static PyGetSetDef arrayflags_getsets[] = {
    {"contiguous",
        (getter)arrayflags_contiguous_get,
        NULL,
        NULL, NULL},
    {"c_contiguous",
        (getter)arrayflags_contiguous_get,
        NULL,
        NULL, NULL},
// ...

We can see here that a contiguous parameter from numpy.info is actually flags.c_contiguous and has nothing in common with ndarray.data.contiguous. I guess when programming in C it was natural to say just contiguous instead of c_contiguous, and this has led to a slight inconsistency in terminology.

  • Related