Home > Net >  Python print() corrupting memory allocated by ctypes
Python print() corrupting memory allocated by ctypes

Time:06-15

I'm working on some code to act as a Python wrapper for a rather large C project. I have created a class wrapper with the associated function wrappers which make direct calls to the DLL. Since it is a C project, it needs a C wrapper as well, which is implemented and working correctly.

const char* MyClass::GetName() {
  printf("Name at %p\n", &Name);
  printf("Name is %s\n", Name);
  return Name;
}

My Python class d is constructed using the Open() method. There is a C function GetName() which simply returns the value of Name. I modified this function in the C source to print out the address and value of the Name variable for debugging. The get_name() function in Python is the wrapper.

??.Open.restype = POINTER(c_int)
??.GetName.argtypes = (POINTER(c_int),)
??.GetName.restype = c_char_p
d = MyClass()
d.get_name()
print('hi')
d.get_name()

This outputs the following:

Name at 0x80012e598
Name is device_name
hi
Name at 0x80012e598
Name is hi

Any other amount of code I have tested so far maintains "Name is device_name" but when it comes to print the value comes back empty or as the last thing passed to print() (it is empty when the last thing passed was large). It seems like the buffer used by print() overlaps with the allocated memory for the object in C . If I run the script with the -u flag (unbuffered outputs), Name it is empty every single time:

Name at 0x800111368
Name is device_name
hi
Name at 0x800111368
Name is

Since the C is printing out the address of the variable, I know it hasn't changed, which means Python is modifying it when it shouldn't be allowed to.

What steps should I take to further debug/resolve this? Thank you in advance.


EDIT

I worked on a minimal reproducible example and discovered the cause of the issue, but do not understand why. It was a part of the init for my Python class. The argument is a string Name which needs to be converted to bytes() to be passed through ctypes. I will show one working example and one breaking example. What is the difference between the two, causing one to work and the other not?

# Create working class
class MyWorkingClass():

    def __init__(self, name):
        self.obj = lib.MyClass_Open(name)

    def get_name(self):
        return lib.MyClass_GetName(self.obj).decode('utf-8')

# This part works
name = bytes('my_name', 'utf-8')
working = MyWorkingClass(name)

for i in range(5):
    print(working.get_name())

And this one gets the wrong data back:

# Create breaking class
class MyBreakingClass():

    def __init__(self, name):
        name = bytes(name, 'utf-8')
        self.obj = lib.MyClass_Open(name)

    def get_name(self):
        return lib.MyClass_GetName(self.obj).decode('utf-8')

# This part doesn't work
breaking = MyBreakingClass('my_name')
for i in range(5):
    print(breaking.get_name())

In both cases, the same exact name should be (from my understanding anyway) getting passed to MyClass_Open(), but clearly that is not the case. Why?

CodePudding user response:

It appears the C code (not shown) is storing a pointer to name being passed. In the breaking case, the bytes object whose internal buffer that pointer references goes out of scope, freeing the buffer and creating undefined behavior.

In the OP's original problem, it is likely the allocation for 'hi' ended up at the same address, but anything could happen due to UB.

Here's a minimal example:

test.cpp - implied implementation from description

#ifdef _WIN32
#   define API __declspec(dllexport)
#else
#   define API
#endif

class MyClass {
    const char* _name;
public:
    MyClass(const char* name) : _name(name) {}    // store pointer during construction
    const char* GetName() const { return _name; } // access pointer later
};

extern "C" {

API MyClass* MyClass_Open(const char* name) {
    return new MyClass(name); // leaks in this example
}

API const char* MyClass_GetName(MyClass* p) {
    return p->GetName();
}

}

test.py - combined examples and made complete

import ctypes as ct

lib = ct.CDLL('./test')
lib.MyClass_Open.argtypes = ct.c_char_p,
lib.MyClass_Open.restype = ct.c_void_p
lib.MyClass_GetName.argtypes = ct.c_void_p,
lib.MyClass_GetName.restype = ct.c_char_p

# Create working class
class MyWorkingClass():

    def __init__(self, name):
        self.obj = lib.MyClass_Open(name)

    def get_name(self):
        return lib.MyClass_GetName(self.obj)

# This part works
# bytes object is created here
# "name" is the only reference but it is still in scope during get_name() below
name = bytes('my_name', 'utf-8')
working = MyWorkingClass(name)

for i in range(5):
    print(working.get_name())

# Create breaking class
class MyBreakingClass():

    def __init__(self, name):
        # bytes object is created here
        # "name" is the only reference and goes out of scope when __init__ returns
        name = bytes(name, 'utf-8')
        self.obj = lib.MyClass_Open(name)

    def get_name(self):
        return lib.MyClass_GetName(self.obj)

# This part doesn't work
breaking = MyBreakingClass('my_name')
for i in range(5):
    print(breaking.get_name()) # garbage output

Output:

b'my_name'
b'my_name'
b'my_name'
b'my_name'
b'my_name'
b'\xf0'        # could be anything due to UB
b'\xf0'
b'\xf0'
b'\xf0'
b'\xf0'

CodePudding user response:

As i understand u want to create a c dll with some functions, that will be called from python code. Then u could use pybind11 (https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html).

For your question there is an example from the help:

#include <pybind11/pybind11.h>

namespace py = pybind11;
    
PYBIND11_MODULE(example, m) {
    m.doc() = "pybind11 example plugin"; // optional module docstring

    m.def("charptr",
        [](const char *s) { cout << "My favorite food is\n"; cout << s; });
}

Description from https://pybind11.readthedocs.io/en/stable/basics.html

The PYBIND11_MODULE() macro creates a function that will be called when an import statement is issued from within Python. The module name (example) is given as the first macro argument (it should not be in quotes). The second argument (m) defines a variable of type py::module_ which is the main interface for creating bindings. The method module_::def() generates binding code that exposes the add() function to Python.

Pybind is very powerfull tool, also if u want u can create a pybind class in your c code, then u will be able to create class instance in python and call its methods.

  • Related