I have a question which similarly answered here but it is not exactly what I need. I have two threads, each has a loop. Now I want to force two threads to work like a flip flop. exactly like this ABABAB or BABABA... it is not important who start first but must work one by one.
There is a very easy code I have, but it does not work well because the thread A iterates super fact and takes the lock again. Please help me as I am learning C multi-threading.
1- in the above link it is said that maybe it's not best approach to have two threads. Assume it is a game and I must run one iteration for player A and one iteration for Player B. I agree it does not give me much better efficiency because at each moment only one of them is working, I want to learn if there is a way.
int pointA , pointB;
void testA()
{
int i = 0;
while (i < 10)
{
unique_lock<std::mutex> lck(mtx);
cout << pointB << endl;
pointA ;
i ;
}
}
void main()
{
int i = 0;
pointA =100, pointB=0;
thread t(testA);
while (i < 10)
{
unique_lock<std::mutex> lck(mtx);
cout << pointA << endl;
pointB ;
i ;
}
t.join();
}
CodePudding user response:
Without using the standard and if same code in multiple threads is allowed, you can branch the flow by a variable:
// in both threads
unique_lock<std::mutex> lck(mtx);
if(var && myId == "A")
{
// stuff that A does
var = false;
}
else if(!var && myId == "B")
{
// stuff that B does
var = true;
}
but this would be slow because there are other cases where id values do not match the variable condition and checking every case makes it even more slower.
C has something to help on this:
std::condition_variable
by using condition variable, you can have a different condition per thread automatically triggered to stop waiting:
std::condition_variable cv;
...
std::unique_lock lk(mtx);
cv.wait(lk, []{return your_logic();});
Since it just waits, it does not waste CPU cycles like the first example. Unnecessary waking-up/processing gets lower and memory bandwidth is not wasted either.
More implicit way of combining outputs from 2 threads would be using two thread-safe queues, one from A to B, one from B to output:
// assuming the implementation blocks .front() until it is filled
ThreadSafeQueue q1;
ThreadSafeQueue q2;
// in thread A
for(int i=0;i<10;i )
q1.push("A");
// in thread B
for(int i=0;i<10;i )
{
q2.push(q1.front() "B");
q1.pop();
}
// in main thread
auto result = q2.front(); // "AB"
q2.pop();
With this pattern, thread-B would only work once for each result of thread-A. But this doesn't synchronize the threads. The thread-A could fill queue with 10 "A" values before thread-B processes the 5th "AB" and before main thread gets the 3rd "AB".
To enforce flip-flop-like work in time, you can limit the size of the queues to 1 or 2. Then it would block thread-A until thread-B consumes it and second queue would block thread-B until main thread consumes it.
Yet another way of synchronizing multiple threads for different tasks would be using cyclic-barriers:
[C 20]
std::barrier sync_point(size /*2?*/, func_on_completion);
// in thread A
..stuff..flip..
sync_point.arrive_and_wait();
..more stuff that needs updated stuff..
// in thread B
..stuff..flop..
sync_point.arrive_and_wait();
..more stuff that needs updated stuff..
the barrier makes sure both threads wait each other before continuing. If this is in a loop, then they will process one step (1 step means both A and B produced at the same time here) at a time while waiting each other before going next iteration. So it will produced ABBAABABBABAAB while never doing more A or more B than the other. If A is always required before B, then you need more barriers to ensure order:
// in thread A and B
if(thread is A)
output "A"
sync_point.arrive_and_wait();
if(thread is B)
output "B"
sync_point.arrive_and_wait();
this prints ABABABAB...
If you are using OpenMP, it has barrier too:
#pragma omp parallel
{
...work...
#pragma omp barrier
...more work...
}
if you don't want second part happened same time as first part of next iteration, you need two barriers:
for(...)
#pragma omp parallel
{
...work...
#pragma omp barrier
...more work...
#pragma omp barrier
}
if order of two threads' work in each iteration is still important, this would require dedicated segments to each thread
for(...)
#pragma omp parallel
{
if(thread is A?)
do this
#pragma omp barrier
if(thread is B?)
do that
#pragma omp barrier
}
this would write ABABAB always, although with decreased efficiency because OpenMP block start/stop overhead is high and measurable in a loop. It would be better to have a loop in each thread instead:
#pragma omp parallel num_threads(2)
{
// this loop runs same for both threads, not shared/parallelized
for(int i=0;i<10;i )
{
int id=omp_get_thread_num();
if(id==0)
std::cout<<"A"<<std::endl;
#pragma omp barrier
if(id==1)
std::cout<<"B"<<std::endl;
#pragma omp barrier
}
}
this outputs ABABABABAB... and has no openmp start/stop overhead (but still barrier overhead exists).
CodePudding user response:
based on this answer and answer above, I managed to write the code. We need one flag to switch the turn between two loops. There is also another way with ready-go approach explained well here, it is in C# but concepts are same:
#include <iostream>
#include <thread>
#include <mutex>
using namespace std;
mutex mutex1;
condition_variable cv3;
char turn;
void ThreadA()
{
for (int i = 0; i < 1000; i )
{
unique_lock<mutex> lock(mutex1);
cv3.wait(lock, [] {return (turn == 'A'); });
cout << "Thread A" << endl;
turn = 'B';
cv3.notify_all();
}
}
void ThreadB()
{
for (int i = 0; i < 1000; i )
{
unique_lock<mutex> lock(mutex1);
cv3.wait(lock, [] {return (turn == 'B'); });
cout << "Thread B" << endl;
turn = 'A';
cv3.notify_all();
}
}
void ExecuteThreads()
{
turn = 'A';
std::thread t1(ThreadA);
std::thread t2(ThreadB);
t1.join();
t2.join();
std::cout << "Finished" << std::endl;
}
int main()
{
ExecuteThreads();
return 0;
}