I want to introduce C code in Python and the C code has the following statements:
- test.h
#include<Python.h>
PyObject *getFeature(wchar_t *text,
PyObject *unigram);
// where the unigram is a Set Object with type 'PySetObject'
- test.c
#include<test.h>
PyObject *getFeature(wchar_t *text,
PyObject *unigram)
{
int ret = -1;
PyObject *featureList = PyList_New(0);
PyObject *curString = PyUnicode_FromWideChar(text, 2);
ret = PySet_Contains(unigram, curString);
printf("## res: `nc`, %d.\n", ret);
ret = PyList_Append(featureList, curString);
return featureList;
}
and then I compiled it and get a shared lib called libtest.so
. So I can import this C .so
file into the python code with ctypes
like below:
- test.py
import ctypes
dir_path = 'path/to/the/libtest.so'
feature_extractor = ctypes.cdll.LoadLibrary(
os.path.join(dir_path, 'libtest.so'))
get_feature_c = feature_extractor.getFeature
get_feature_c.argtypes = [
ctypes.c_wchar_p, ctypes.py_object]
get_feature_c.restype = ctypes.py_object
unigram = {'据','nc', 'kls'}
print(hash('据'))
print(hash('nc'))
print(hash('kls'))
res = get_feature_c('nc', unigram)
execute this test.py
file and I can get the following fault:
6875335301337518411
6875335301337518411
-5567445891360670268
Segmentation fault
I know the bug is caused by the confliction of different string nc
and 据
, which have the same hash value 6875335301337518411
.
Python use a secondary level hashtable to tackle the confliction of strings with same hash value.
So how to solve this problem and import the secondary confliction hashtable to the C code?
CodePudding user response:
The hash match is a red herring. The problem is not using PyDLL
so the GIL is held when using the CPython APIs.
test.c
#include <Python.h>
#ifdef _WIN32
# define API __declspec(dllexport)
#else
# define API
#endif
API PyObject *getFeature(wchar_t *text, PyObject *unigram)
{
int ret = -1;
PyObject *featureList = PyList_New(0);
PyObject *curString = PyUnicode_FromWideChar(text, 2);
ret = PySet_Contains(unigram, curString);
printf("## res: `nc`, %d.\n", ret);
ret = PyList_Append(featureList, curString);
Py_DECREF(curString); // fix reference leak
return featureList;
}
test.py
import ctypes as ct
dll = ct.PyDLL('./test') # Use PyDLL so GIL is held
dll.getFeature.argtypes = ct.c_wchar_p, ct.py_object
dll.getFeature.restype = ct.py_object
unigram = {'据','nc', 'kls'}
print(hash('据'))
print(hash('nc'))
print(hash('kls'))
print(dll.getFeature('nc', unigram))
Output:
5393181648594783828
5393181648594783828
-5015907635941537187
## res: `nc`, 1.
['nc']