I wrote a python code embedded with C code by using ctypes.
the C code is being called multiple times in a for loop.
the C code is as follows:
test.h
#include<Python.h>
PyObject *getFeature(wchar_t *text);
// where the unigram is a Set Object with type 'PySetObject'
- test.c
#include<test.h>
PyObject *getFeature(wchar_t *text)
{
int ret = -1;
PyObject *featureList = PyList_New(0);
PyObject *curString = PyUnicode_FromWideChar(text, 2);
ret = PyList_Append(featureList, curString);
Py_DECREF(curString);
return featureList;
}
and then I compiled it and get a shared lib called libtest.so. So I can import this C .so file into the python code with ctypes like below:
- test.py
import ctypes
dir_path = 'path/to/the/libtest.so'
feature_extractor = ctypes.PyDLL(
os.path.join(dir_path, 'libtest.so'))
get_feature_c = feature_extractor.getFeature
get_feature_c.argtypes = [
ctypes.c_wchar_p, ctypes.py_object]
get_feature_c.restype = ctypes.py_object
def get_feature(text):
return [text[:2]]
times = 100000
for i in range(times):
res = get_feature_c('ncd') # the memory size will become larger and larger.
for i in range(times):
res = get_feature('ncd') # the memory will remain in a fixed size.
and I moniter the memory cost of the program with command
top
and find that the memory explodes in comply with thefor loop times
.but when I write a python func, the memory remains in a steady size.
I assume that after every call of the C func, the memory is not released correctly. So how to release and control the memory after each calling?
BTW: I only ask this question in a simple way, and the whole C func code is in C code. and there is no memory leak in the C code.
CodePudding user response:
The code in your example doesn't leak:
#include<test.h>
PyObject *getFeature(wchar_t *text)
{
int ret = -1;
PyObject *featureList = PyList_New(0);
// Create new reference to "curString" (allcates memory)
PyObject *curString = PyUnicode_FromWideChar(text, 2);
// Add "curString" to "featureList", incrementing reference count
ret = PyList_Append(featureList, curString);
// "curString" no longer used, reduce reference count.
Py_DECREF(curString);
// Correctly returns a single reference to the list,
// which contains a single reference to a string
return featureList;
}
When res
is re-assigned the return value of get_feature_c
, the previous value of res
(a list) has its reference count reduced. If that count is zero (it is) then the references of each item in the list is decremented as well, and the objects are freed if their reference goes to zero, then the list object is freed as well.
But in your referenced C code, There are many leaks due to not calling Py_DECREF
. When you leak a reference, an object's reference count never reaches zero and never freed, creating a memory leak:
// Create a new object with "PyUnicode_FromWideChar",
// Add another reference via "featureList",
// so leaked reference to the object.
ret = PyList_Append(featureList, PyUnicode_FromWideChar(charCurrentFeature, 2));
Also here:
PyObject *bigrams1 = PySet_New(0);
// each "PyUnicode_FromWideChar" leaks a reference.
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"据", 1));
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"nc", 2));
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"ckd", 3));
ret = PySet_Add(unigrams1, PyUnicode_FromWideChar(L"nc.3e", 5));
You can test if your code leaks references with a debug build of your test DLL and a debug build of Python. I'll demonstrate with a Windows build:
test.c - debug build compiled with Micrsoft Visual Studio
cl /LD /MDd /W3 /Ic:\python310\include test.c -link /libpath:c:\python310\libs
#ifdef _WIN32
# define API __declspec(dllexport)
#else
# define API
#endif
#include <Python.h>
API PyObject *getFeature(wchar_t *text)
{
int ret = -1;
PyObject *featureList = PyList_New(0);
PyObject *curString = PyUnicode_FromWideChar(text, 2); // allocates curString (1st reference)
ret = PyList_Append(featureList, curString); // Creates 2nd reference to curString in featureList
Py_DECREF(curString); // curString no longer used
return featureList;
}
test.py
import ctypes as ct
import sys
feature_extractor = ct.PyDLL('./test')
get_feature_c = feature_extractor.getFeature
get_feature_c.argtypes = ct.c_wchar_p, # OP example code had error here
get_feature_c.restype = ct.py_object
def get_feature(text):
return [text[:2]]
times = 10
for i in range(times):
print(sys.gettotalrefcount()) # Only available in debug build of Python
res = get_feature_c('ncd')
Output when run with debug build of Python to enable sys.gettotalrefcount()
, and note that total reference count doesn't grow over loops:
C:\>python_d test.py
70904
70910
70910
70910
70910
70910
70910
70910
70910
70910
Now with Py_DECREF
commented out a reference is leaked every loop:
70904
70911
70912
70913
70914
70915
70916
70917
70918
70919