Home > Software engineering >  Can I prevent the invalidation/destruction of 'this' instance pointer in a stored c lamb
Can I prevent the invalidation/destruction of 'this' instance pointer in a stored c lamb

Time:11-08

This question is sort of an extension to this one, as it's a similar problem but doesn't fully answer my problem as much as I would like (not to mention the problem in that question doesn't involve threads/multi-threading).

The Problem

Like the title suggests, I've encountered a specific problem regarding lambdas and std::threads where my background knowledge can't help me. I'm essentially trying to store a std::function, which contains a member function, in a static class to be called later in a separate std::thread. The problem is that this member function needs the this instance pointer by reference because this member function modifies data members of the same class, but the this pointer passed by reference into the lambda is invalidated/destroyed by the time the member function is called, which causes what seems like undefined behavior. This problem is actually somewhat talked about in proposal document P0018R3, where it talks about lambdas and concurrency.

Semi-Minimal Working Example

//Function class
struct FuncWrapper {
    //random data members
    std::string name;
    std::string description;
    //I still have this problem even if this member is const
    std::function<void()> f;
};
class FuncHolder {
public:
    //For the FuncWrapper arguement, I've tried pass by reference, const-reference, rvalue, etc. doesn't make a difference in terms of the problem (as I expect)
    //add a FuncWrapper object to the static holder
    static void add(FuncWrapper f) {
        return const_cast<std::vector<FuncWrapper>&>(funcs).push_back(f);
    }
    //Same here with the return
    //get a static FuncWrapper object by index (with bounds checking)
    static const std::function<void()>& get(size_t index) {
        return funcs.at(index);
    }
private:
    //C  17 inline static definition; If I weren't using it I would do it the non-inline way
    const static inline std::vector<FuncWrapper> funcs;
};
//Data member class
struct Base {
    //This is meant to be like a constexpr; its unique for each derived type
    inline virtual const char* const getName() const = 0;
protected:
    //This will be important for a solution I tried
    const size_t storedIdx = 0;
};
class Derived : public Base {
public:
    Derived(){
        FuncHolder::add({"add5", "adds 5", 
            [&](){increment(5);} //This is the problem-causing statement
        });
        FuncHolder::add({"add1", "adds 1", 
            std::bind(&FuncHolder::increment, 1) //This syntax also causes the same problem
        }); 
    }
    inline const char* const getName() const override {
        return "Derived";
    }
    void increment(int amount){
         //Do some stuff...
         privMember  = amount;
         //Do some other stuff..
    }
private:
    int privMember = 0;
};

//Class to hold the static instances of the different derived type
class BaseHolder {
public:
    //make a new object instace in the static vector
    template<class DerivedTy>
    static void init(const DerivedTy& d){
        static_assert(std::is_base_of_v<Base, DerivedTy>); //make sure it's actually derived
        size_t idx = baseVec.size(); //get the index of the object to be added
        const_cast<std::vector<std::unique_ptr<Base>>&>(baseVec).emplace_back(std::make_unique<DerivedTy>(d)); //forward a new unique_ptr object of derived type
        const_cast<size_t&>(baseVec.at(idx)->storedIdx) = idx; //store the object's index in the object itself
    }
    ///This function is used later for one of the solutions I tried; it goes unused for now
    ///So, assume the passed size_t is always correct in this example
    ///There's probably a better way of doing this, but ignore it for the purposes of the example
    //get a casted reference to the correct base pointer
    template<class DerivedTy>
    static DerivedTy& getInstance(size_t derivedIdx){
        return *(static_cast<DerivedTy*>(baseVec.at(derivedIdx).get()));
    }
private:
    //C  17 inline static again
    const static inline std::vector<std::unique_ptr<Base>> baseVec{};
};
int main() {
    BaseHolder::init(Derived()); //add a new object to the static holder
    //Do stuff...
    std::thread runFunc([&](){
        FuncHolder::Get(0)(); //Undefined behavior invoked here; *this pointer used in the function being called is already destroyed
    });
    //Main thread stuff...
    runFunc.join();
    return 0;
}

It may not be a super minimal example, but I wanted to highlight the important details (such as how the function is stored and the class(es) that call them) so that it's clear how the problem originates.
There's also a few possibly unrelated yet important parts of the design to point out.

  1. it's intended for there to be many classes/types deriving the Base class (i.e. Derived1, Derived2, etc.), but there will only be one instance of each of those derived classes; Hence why all members of the BaseHolder class are static. So if this design does need to be reworked, keep this in mind (though honestly maybe this could be implemented in a better way than it is now, but that may be unrelated to the problem).
  2. It may be instinctual to make the BaseHolder class be templated and just pass the classes/types that I want it to hold to its template at compile time (and thus use something like a tuple instead of a vector), but I didn't do that on purpose because I may need to add more Derived types later in runtime.
  3. I can't really change the template type of f (the std::function<>) because I may need to pass different functions with their own return type and arguments (which is why I use a lambda sometimes and std::bind at times when I just want the callable to a be a function with void return type). In order to accomplish this I just make it a std::function<void()>
  4. The overall goal of this design is to statically call and invoke a function (as if it were triggered by an event) that has been constructed before being called and has the ability to modify a given class (specifically the class its constructed in - Derived in this case).

Problem Origin

Looking into this problem, I know that the this pointer captured by reference in a lambda can be invalidated by the time the lambda runs in a different thread. Using my debugger, I seems like the this pointer was destroyed by the time the lambda was being constructed in the constructor of Derived, which went against my previous knowledge, so I can't be 100% sure this is what's going on; The debugger showed that the entire this instance of Derived was filled with junk values or was unreadable

Derived(){
    FuncHolder::add({"add5", "adds 5", //`this` pointer is fine here
       [&](){increment(5);} //`this` pointer is filled with junk and pointing to a different random address
    });
    //...
}

I'm more sure though about the undefined behavior when the lambda/function is invoked/ran due to its seemingly destroyed this instance of Derived pointer. I get different exceptions each time, from different files, and sometimes just get a flat out access read access violation and what not sometimes. The debugger also can't read the memory of the this pointer of the lambda when it comes around to invoking it; All look like signs of a destroyed pointer.
I've also dealt with this type of problem before in lambdas and know what to do when std::threads are not involved, but the threads seem to complicate things (I'll explain that more later).

What I've Tried

Capture by value

The easiest solution would be to just make the lambda capture the this pointer of Derived by value (as mentioned in both the answer to the aforementioned question and proposal document P0018R3) since I'm using C 17. The proposal document even mentions how capturing this by value is necessary for concurrent applications such as threading:

Derived(){
    FuncHolder::add({"add5", "adds 5", 
        [&, *this](){increment(5);} //Capture *this by value (C  17); it's thread-safe now
    });
    //...
}

The problem with this is, like I said, the functions passed into/captured by the lambda need to modify data members of the class; if I capture this by value, the function is just modifying a copy of the Derived instance, instead of the intended one.

Use Static Instance

Okay, if each derived class is only supposed to have one static instance, and there's a static holder of derived classes, why not just use the static instance in the lambda and modify that instance directly?:

Derived(){
    FuncHolder::add({"add5", "adds 5", 
        [=](){BaseHolder::getInstance(storedIdx)::increment(5);} //use static instance in lambda; Again assume the passed index is always correct for this example
    });
    //...
}

This might look good on paper, but the problem is that the getInstance() is being called in the constructor, before the actual instance is created using the constructor. Specifically, the Derived() constructor is called in BaseHolder::init(Derived()) where init tries to create the instance in vector in the first place; But, the vector is accessed in the Derived() constructor, which is called before init is.

Pass Static Instance to Member Function

Another answer in the aforementioned question says to change the function in the lambda to have a arguement that takes an instance of its class. In our example, it would look something like this:

class Derived : public Base {
public:
    Derived(){
        FuncHolder::add({"add5", "adds 5", 
            [&](){increment(BaseHolder::getInstance(storedIdx), 5);} //Pass the static instance to the actual function
        });
        //...
    }
    //rest of the class...
    void increment(Derived& instance, int amount){
         //Do some stuff...
         instance.privMember  = amount;
         //Do some other stuff..
    }
private:
    int privMember = 0;
};

But this as the same problem as the previous attempted solution (using the static instance in the lambda): the static instance isn't created yet because it's calling the constructor accessing the instance to create it.

shared_ptr of this (directly)

A solution mentioned more than once in the aforementioned question was to make and use a shared_ptr (or any smart pointer for that matter) of this to extend its lifetime and what not (though the answers did not go into depth on how to implement it). The quick-and-dirty way to do this is directly:

Derived(){
    FuncHolder::add({"add5", "adds 5", 
        [self=std::shared_ptr<Derived>()](){self->increment(5);} //pass a shared_ptr of *this; syntax can differ
    });
    //...
}

The problem with this is that you get a std::bad_weak_ptr exception, as doing it this way is probably incorrect anyway (or at least I assume).

shared_ptr of this (std::enable_shared_from_this<T>)

The solution in this blog post, and the solution I usually use when threads are not involved, is to make use of std::enable_shared_from_this<T>::shared_from_this to capture a proper shared_ptr of this:

class Derived : public Base, public std::enable_shared_from_this<Derived> {
    Derived(){
        FuncHolder::add({"add5", "adds 5", 
            [self=shared_from_this()](){self->increment(5);} //pass a shared_ptr of *this
        });
        //...
    }
    //rest of class...
}

This looks good on paper, and doesn't really cause any exceptions, but it doesn't seem to change anything; The problem still remains and it seems no different than just capturing this by reference normally.

Conclusion

Can I prevent the destruction/invalidation of the this pointer of the derived class in the lambda by the time it's called in another thread? If not, what's the right way to do what I am trying to achieve? I.e. how could I rework the design so it functions properly while still keeping my design principles preserved?

CodePudding user response:

I think there are many solutions that could work here, but it's not a "short" problem to describe. Here's my take on two approaches:

1 - Facilitate intentions

Looks like you need to share some state between an object and a lambda. Overriding lifetimes is complicated so do what you intend, and lift the common state into a shared pointer:

class Derived // : Base classes here
{
  std::shared_ptr<DerivedImpl> _impl;

public:
  /*
  Public interface delegates to _impl;
  */
};

this gives the ability to have 2 flavors of interaction with the shared state:

// 1. Keep the shared state alive from the lambda:
func = [impl = _impl]() { /* your shared pointer is valid */ };

// 2. Only use the shared state if the original holder is alive:
func = [impl = std::weak_ptr(_impl)]() {
  if (auto spt = impl.lock())
  {
    // Use the shared state
  }
};

After all the enable_shared_from_this approach didn't work for you because you didn't share the state to begin with. By storing the shared state internally you can keep the value semantics of your original design and facilitate uses where lifetime management gets complicated.

2 - Put state in a safe place

The safest way to guarantee that some state is "alive" at a certain point is to put it at a higher scope:

  1. Static storage / global scope
  2. A stack frame that is higher (or lower depending on your mental model of memory) than both users, the derived class and the lambda.

Having such a "store" would allow you to do "placement new" of your objects, to construct them without using the free store (heap). Then you pair this facility with reference counting, so that the last one to reference the derived object will be the one to call the destructor.

This scheme is not trivial to achieve and you should only shoot for it if there's major performance requirement from this hierarchy. If you decide to go down this road, you can also consider memory pools, that offer this kind of lifetime manipulation and have many of the related puzzles solved before hand.

CodePudding user response:

The thing I see you want to do is to keep an object alive as long as it is in use by another thread. Example here :

#include <future>
#include <thread>
#include <iostream>

class MyClass final
{
public:
    MyClass()
    {
        std::cout << "Hello constructor\n";
    }

    ~MyClass()
    {
        std::cout << "Hello destructor\n";
    }

    int Hello()
    {
        std::cout << "Hello World!\n";
        return 2;
    }
};


int main()
{
    // thread synchronization object
    std::future<int> future;

    // start a scope
    {
        std::cout << "Enter scope\n";
        // create a shared_ptr to the object you want
        // to share between threads.
        auto my_class_ptr = std::make_shared<MyClass>();

        // in C   the preferred abstraction for threads
        // is using std::async this also allows you to
        // return values AND exceptions from other threads
        // 
        // Capture the shared_ptr by value! Its reference count
        // will be increased and the lambda will now also own it.
        // extending its live cycle.
        future = std::async(std::launch::async, [my_class_ptr]
        {
            std::cout << "Async lambda started\n";
            auto retval = my_class_ptr->Hello();
            std::cout << "Async lambda ended\n";
            return retval;
        });

    // go out of scope
    // this will destroy the local shared_ptr to MyClass
    // but will not call destructor of MyClass yet since lambda 
    // still holds a reference
    std::cout << "Leave scope\n";
    }

    // just to show you you can easily get information back
    // from another thread. This call will also wait for the work
    // to finish (kind of like thread.join)
    std::cout << "Wait for async call result\n";
    auto retval = future.get();
    std::cout << "Return value = " << retval << "\n";
    std::cout << "End\n";
    return 0;
}
  • Related