seeming memory leak in numpy for Mac?-CodePudding

I used the following process the generate a numpy array with size = (720, 720, 3). In principle, it should cost 720 * 720 * 3 * 8Byte = 12.3MB. However, in the ans = memory_benchmark(), it costs 188 MB. Why does it cost much more memory than expected? I think it should have same cost as the line m1 = np.ones((720, 720, 3)).

I have following two Environments. Both have same problem.

Environment1: numpy=1.23.4, memory_profiler=0.61.0, python=3.10.6, MacOS 12.6.1(Intel not M1)

Environment2: numpy=1.19.5, memory_profiler=0.61.0, python=3.8.15, MacOS 12.6.1(Intel not M1)

I did memory profile in the following

import numpy as np
from memory_profiler import profile


@profile
def memory_benchmark():
    m1 = np.ones((720, 720, 3))
    m2 = np.random.randint(128, size=(720, 720, 77, 3))
    a = m2[:, :, :, 0].astype(np.uint16)
    b = m2[:, :, :, 1].astype(np.uint16)
    ans = np.array(m1[b, a].sum(axis=2))
    m2 = None
    a = None
    b = None
    m1 = None
    return ans


@profile
def f():
    ans = memory_benchmark()
    print(ans.shape)
    print("finished")


if __name__ == '__main__':
    f()

(720, 720, 3)
finished

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     5     59.3 MiB     59.3 MiB           1   @profile
     6                                         def memory_benchmark():
     7     71.2 MiB     11.9 MiB           1       m1 = np.ones((720, 720, 3))
     8    984.8 MiB    913.7 MiB           1       m2 = np.random.randint(128, size=(720, 720, 77, 3))
     9   1061.0 MiB     76.1 MiB           1       a = m2[:, :, :, 0].astype(np.uint16)
    10   1137.1 MiB     76.1 MiB           1       b = m2[:, :, :, 1].astype(np.uint16)
    11   1160.9 MiB     23.8 MiB           1       ans = np.array(m1[b, a].sum(axis=2))
    12    247.3 MiB   -913.6 MiB           1       m2 = None
    13    247.3 MiB      0.0 MiB           1       a = None
    14    247.3 MiB      0.0 MiB           1       b = None
    15    247.3 MiB      0.0 MiB           1       m1 = None
    16    247.3 MiB      0.0 MiB           1       return ans


Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    19     59.3 MiB     59.3 MiB           1   @profile
    20                                         def f():
    21    247.3 MiB    188.0 MiB           1       ans = memory_benchmark()
    22    247.3 MiB      0.0 MiB           1       print(ans.shape)
    23    247.3 MiB      0.0 MiB           1       print("finished")

If I print(type(m1[0, 0, 0])) yields <class 'numpy.float64'>, print(type(m2[0, 0, 0, 0])) yields <class 'numpy.int64'>, print(type(ans[0, 0, 0])) yields <class 'numpy.float64'>

However, in my Ubuntu VM, I don't have above problem.

CodePudding user response：

Those numbers look fine to me:

In [772]: 720*720*3*8/1e6
Out[772]: 12.4416

In [773]: 720*720*3*8/1e6 * 77
Out[773]: 958.0032

In [775]: 720*720*77*2/1e6
Out[775]: 79.8336

Evidently once you drop it to 247.3 MiB, the interpreter/numpy decides to "hang on" to that memory, rather than return it to the OS. When tracking memory you are dealing the "choices" of several layers - OS, python interpreter, and numpy's own memory management. One or more of those layers can maintain a "free space" from which it can allocated new objects or arrays.

CodePudding user response：

I can't reproduce the results you're getting. In python 3.7.3, numpy 1.21.4, and memory_profiler 0.61.0, I'm getting the following results


Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    23     57.6 MiB     57.6 MiB           1   @profile
    24                                         def memory_benchmark():
    25     69.5 MiB     11.9 MiB           1       m1 = np.ones((720, 720, 3))
    26    527.4 MiB    457.8 MiB           1       m2 = np.random.randint(128, size=(720, 720, 77, 3))
    27    603.6 MiB     76.3 MiB           1       a = m2[:, :, :, 0].astype(np.uint16)
    28    679.9 MiB     76.3 MiB           1       b = m2[:, :, :, 1].astype(np.uint16)
    29    692.0 MiB     12.1 MiB           1       ans = np.array(m1[b, a].sum(axis=2))
    30    234.3 MiB   -457.7 MiB           1       m2 = None
    31    158.0 MiB    -76.3 MiB           1       a = None
    32     81.7 MiB    -76.3 MiB           1       b = None
    33     69.8 MiB    -11.9 MiB           1       m1 = None
    34     69.8 MiB      0.0 MiB           1       return ans


(720, 720, 3)
finished

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    37     57.6 MiB     57.6 MiB           1   @profile
    38                                         def f():
    39     69.8 MiB     12.2 MiB           1       ans = memory_benchmark()
    40     69.8 MiB      0.0 MiB           1       print(ans.shape)
    41     69.8 MiB      0.0 MiB           1       print("finished")

Printing type(m1[0,0,0,0]) yields <class 'numpy.int32'>, so the 457.8 MiB makes sense. On the other hand, your output seems weird, given that assigning m1 to None reports no difference in memory. Which python & library versions are you using?

Update: In a different machine, with python 3.10.6, numpy 1.23.5, and memory_profiler 0.61.0, I still cannot reproduce the OP output.

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     5     35.6 MiB     35.6 MiB           1   @profile
     6                                         def memory_benchmark():
     7     47.4 MiB     11.8 MiB           1       m1 = np.ones((720, 720, 3))
     8    961.2 MiB    913.8 MiB           1       m2 = np.random.randint(128, size=(720, 720, 77, 3))
     9   1037.4 MiB     76.2 MiB           1       a = m2[:, :, :, 0].astype(np.uint16)
    10   1113.6 MiB     76.2 MiB           1       b = m2[:, :, :, 1].astype(np.uint16)
    11   1125.8 MiB     12.2 MiB           1       ans = np.array(m1[b, a].sum(axis=2))
    12    212.1 MiB   -913.6 MiB           1       m2 = None
    13    136.0 MiB    -76.1 MiB           1       a = None
    14     59.9 MiB    -76.1 MiB           1       b = None
    15     48.0 MiB    -11.9 MiB           1       m1 = None
    16     48.0 MiB      0.0 MiB           1       return ans


(720, 720, 3)
finished

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    19     35.6 MiB     35.6 MiB           1   @profile
    20                                         def f():
    21     48.0 MiB     12.3 MiB           1       ans = memory_benchmark()
    22     48.0 MiB      0.0 MiB           1       print(ans.shape)
    23     48.0 MiB      0.0 MiB           1       print("finished")