Home > Back-end >  Join threads created recursively
Join threads created recursively

Time:12-06

I have a function that basically fetches a data from a database, then parse this data and fetches others data to which it is dependant, and so on...

The function is thus recursive, and I want to use multithreading to do so.

To simplify the problem, I just writed a dummy program, just for expressing the "spirit" of the function:

void DummyFunction(std::vector<std::thread>& threads, int& i)
{
      i;

    if (i < 10)
        threads.push_back(std::thread([&]() { DummyFunction(threads, i); }));
}

int main()
{
        std::vector<std::thread> threads;
        int i = 0;
        DummyFunction(threads, i);

        // Coming here, "DummyFunction" is still running and potentially creating new threads
        // Issue is thus we may enter the for loop when we still don't have the actual number of threads created
        for (std::thread& thread : threads)
        {
            thread.join();
        }
}

The issue comes from the need to wait for all the threads to finish running before going any further (hence the for loop to join the threads). But of course, since the "DummyFunction" is still running, new threads can be created and so this way it can't work...

Question is, how can I design such thing properly (if there is a way...)? Can we actually use multi threading recursively?

CodePudding user response:

If you have C 20 available consider using the new thread that automatically joins on destruction. It goes by the name jthread and will save you all the trouble from having to manually join threads.

CodePudding user response:

Try a thought experiment: add an else clause to your if statement:

if (i < 10)
{
    threads.push_back(std::thread([&]() { DummyFunction(threads, i); }));
}
else
{
    // do something here
}

Once you make that change, a few minutes' worth of thinking will reach the following conclusion: the "do something here" part gets executed exactly once, in one of the execution threads, after all of the execution threads get created.

Now, the solution should be very obvious:

  1. Add a mutex, a condition variable, and a boolean flag. You can either make them global; pass them as additional parameters into DummyFunction, or, better yet: turn your threads vector into its own class containing the vector, the mutex, the condition variable, and the boolean flag, and pass that in recursively instead of just the vector.

  2. main() locks the mutex, clears the condition variable, and after DummyFunction() returns it waits on the condition variable until the boolean flag is set.

  3. The "do something here" part locks the same mutex, sets the boolean flag, signals the condition variable, and unlocks the mutex.

Once you reach this point, you will also suddenly realize one more thing: as is, you have different execution threads all attempting to push_back something into the same vector. Vectors are not thread-safe, so this is undefined behavior. Therefore, you will also need to implement a separate mutex (or reuse the existing one, this looks eminently possible to me) to also lock the access to the vector.

  • Related