Home > Enterprise >  Numpy | hex(id()) vs. .data
Numpy | hex(id()) vs. .data

Time:12-09

I have two questions that I have been dealing with for two days:

if I want to determine for a numpy object adarray the memory address of the object and the elements with the numpy method .array and once with the normal python functions hex(id()) I get different addresses. with hex(id()) it gets really weird. sometimes the elements get the same addresses sometimes different ones.

import numpy as np
y = np.array([0,1,2,3])
print(y.data)
print(y[0].data)
print(y[1].data)
print(y[2].data)
print(y[3].data)
print(hex(id(y[0])))
print(hex(id(y[1])))
print(hex(id(y[2])))
print(hex(id(y[3])))

the results are:

<memory at 0x7f9aaa22d870>
<memory at 0x7f9aaa1bd940>
<memory at 0x7f9aaa1bd940>
<memory at 0x7f9aaa1bd940>
<memory at 0x7f9aaa1bd940>
0x7f9aaa31e030

with hex((id))
0x7f9aaa1c0750
0x7f9aaa1c0730
0x7f9aaa1c0130
0x7f9aaa1c0750

CodePudding user response:

Most of these results don't mean what you're thinking, because NumPy memory layout doesn't work like you're thinking.

A NumPy array object is not its data buffer. The data buffer is separate. With all the metadata an array needs, it would not be possible for an array to literally be its data buffer, and with how NumPy makes heavy use of array views, it would not be possible for an array to directly contain its buffer either. Many arrays can share the same data buffer, or have overlapping data buffers.

A NumPy array object contains some metadata and a number of pointers, one of which points to its buffer. If you had done print(hex(id(y))), you would have gotten the address of the array object itself. With print(y.data), you print a memoryview object representing the array's data buffer, and the "at 0x..." gives the address of the buffer.


When you do y[0], that's not really an array element. It's a new array scalar object, representing an immutable scalar with value taken from the first index of y. It does not directly refer to the memory used for y's first element, because when someone does

x = y[0]
y[0] = 1

they don't want the y[0] = 1 assignment to affect x.

The array scalar has its own address and its own data buffer, separate from the array scalar itself. The array scalar has a very short lifetime, so y[0] and y[1] may end up using the same memory if y[0]'s lifetime ends before you retrieve y[1]. They don't have to use the same memory, but they can.

When you do print(hex(id(y[0]))), you're printing the address of the array scalar. When you do print(y[0].data), you're printing a memoryview representing the array scalar's data buffer.


With all that said, there is almost nothing useful you can do with any of these memory addresses, especially if you're not writing a C extension. If you are writing a C extension, you still probably shouldn't be using any of these addresses directly. Cython is much more convenient than writing C code directly. If you do want to write C to interact with NumPy, you're going to want a much deeper understanding of how NumPy arrays work under the hood, and you should go read the NumPy C API docs.

CodePudding user response:

wow thank you very much!

but did i understand correctly that all values of the ndarray are in the same buffer?

does that mean that my array has four pointers pointing to the individual elements in the buffer?

like in the picture? https://drive.google.com/file/d/1aek3u_kLQ06AeGewsbF_Xd39bxaDwE_4/view?usp=sharing

  • Related