Wrong `nbytes` value in a numpy array after broadcasting with `broadcast

I just noted this:

import numpy as np
import sys

arr = np.broadcast_to(0, (10, 1000000000000))
print(arr.nbytes)  # prints "80000000000000"
print(sys.getsizeof(arr))  # prints "120"

Is this a bug or intended behavior? I.e., is nbytes meant to hold the amount of "logical" bytes, not accounting for 0-strides?

CodePudding user response：

While I don't see it documented, nbytes does look like the product of shape and itemsize, or arr.size*arr.itemsize.

In all examples I've looked at nbytes uses the arrays of shape/size, not that of its base. So I wouldn't read too much into the "consumed" used in the docs.

Your example:

In [117]: arr = np.broadcast_to(0,(1,2,3))
In [119]: arr.shape, arr.strides, arr.nbytes
Out[119]: ((1, 2, 3), (0, 0, 0), 24)
In [120]: arr.base
Out[120]: array(0)
In [121]: arr.base.nbytes
Out[121]: 4

The broadcasted array is a view of a much smaller one; nbytes reflects its own shape, not the shape of the base.

To take another example, where the view is a subset of the base:

In [122]: np.arange(100).nbytes
Out[122]: 400
In [123]: np.arange(100)[::4].nbytes
Out[123]: 100

The code for broadcast_to is viewable at np.lib.stride_tricks._broadcast_to. It uses np.nditer to generate the new view.

sys.getsizeof does a reasonable job of returning memory use for an array with its on data (i.e. base is None). It does not provide any useful information for a view.

sliding_windows

Another example of striding tricks used to make a "larger" array:

In [180]: arr = np.arange(16).reshape(4,4).copy()
In [181]: arr.shape, arr.strides, arr.nbytes
Out[181]: ((4, 4), (16, 4), 64)

In [182]: res = np.lib.stride_tricks.sliding_window_view(arr,(2,2))    
In [183]: res.shape, res.strides, res.nbytes
Out[183]: ((3, 3, 2, 2), (16, 4, 16, 4), 144)

It's a view of the original (4,4) arr:

In [184]: res.base
Out[184]: <numpy.lib.stride_tricks.DummyArray at 0x1fa8e7cc730>
In [185]: res.base.base
Out[185]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
In [186]: res.base.base is arr
Out[186]: True