Home > Blockchain >  pass-by-reference in lambda function, multithreading c
pass-by-reference in lambda function, multithreading c

Time:04-28

the code below is supposed to test the runtime of sin and cos function with different numbers of threads. I am writing this for a project where runtime is very relevant and it is a feasibility study whether multithreading will decrease the runtime enough.

The idea is to pass it a different SAMPLE_SIZE and NUM_THREADS and see how it affects runtime.

Problem: The output is not what I expected it to be.

  1. The ID inside the void-function cos_sin_multiplication is always incremented by one. So I get (ID:1 ... ID:NUM_THREADS 1) instead of (ID:0 ... ID:NUM_THREADS).
  2. When I run the code with 2/3/4 Threads I get a Segmentation Fault.
  3. When I run with 7 or more threads several IDs are changed to NUM_THREADS.
  4. The output of cos_out[0] is always 0

Here an example output for NUM_THREADS = 8 and SAMPLE_SIZE = 100'000.

Initiate Thread: 0 with 12500 datapoints.
Initiate Thread: 1 with 12500 datapoints.
Initiate Thread: 2 with 12500 datapoints.
Initiate Thread: 3 with 12500 datapoints.
Initiate Thread: 4 with 12500 datapoints.
Initiate Thread: 5 with 12500 datapoints.
Initiate Thread: 6 with 12500 datapoints.
Initiate Thread: 7 with 12500 datapoints.
ID: 4: sin: 0.861292 cos: -1.72477
ID: 8: sin: -56.1798 cos: 55.4332
ID: 8: sin: -68.1969 cos: 51.9351
ID: 3: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 1: sin: 0.861292 cos: -1.72477
ID: 8: sin: -61.1793 cos: 58.8878
ID: 8: sin: -64.8086 cos: 59.5946
The execution took: 0.004465 seconds. 
ID: 0: sin: 59.5946 cos: 0
ID: 1: sin: 0.861292 cos: -1.72477
ID: 2: sin: 0.861292 cos: -1.72477
ID: 3: sin: 0.861292 cos: -1.72477
ID: 4: sin: 0.861292 cos: -1.72477
ID: 5: sin: 0 cos: 0
ID: 6: sin: 0 cos: 0
ID: 7: sin: 0 cos: 0

Can anyone point me in the right direction?

//Multithreaded Cosnius and Sinus Calculations Benchmark
// Calculate a sample of Cosinus and Sinus with different numbers of Threads
// to determine the runtime gain for different number of threads 

#include <math.h>

#include <iostream>
#include <fstream>
#include <thread>
#include <mutex>
#include <chrono>
#include <vector>

#define NUM_THREADS 3
#define SAMPLE_SIZE 2000000
#define PI 3.1415

float diff_time;

std::ofstream calc_speed;
std::mutex out_guard;

void cos_sin_multiplication(int id, int sample, float theta, float& value, float& sin_out, float& cos_out){
    for (int j = 0; j < sample; j  ){
        sin_out  = sin(PI*theta);
        cos_out  = cos(PI*theta);
        theta  = 0.1;
    }
    out_guard.lock();
    std::cout << "ID: " << id << ": sin: " << sin_out << " cos: " << cos_out << "\n";
    out_guard.unlock();
}

int main(){
    auto start_time = std::chrono::system_clock::now();

    std::vector<std::thread> Threads;

    int64_t sample_per_thread;
    int mod_sample_per_thread = SAMPLE_SIZE%NUM_THREADS;
    float value[SAMPLE_SIZE];

    float theta = 0.0;
    float cos_out[NUM_THREADS];
    float sin_out[NUM_THREADS];
    

    for(int i = 0; i <  NUM_THREADS; i  ){
        cos_out[i] = 0.0;
        sin_out[i] = 0.0;
    }

    for(int i = 0; i < NUM_THREADS; i  ){   
        
        if (i < mod_sample_per_thread){
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS   1;
        }
        else{
            sample_per_thread = SAMPLE_SIZE/NUM_THREADS;
        }
        out_guard.lock();
        std::cout << "Initiate Thread: " << i <<" with "<< sample_per_thread << " datapoints." << "\n";
        out_guard.unlock();

        Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

    for(auto& t: Threads){
        t.join();
    }

    auto end_time = std::chrono::system_clock::now();
    std::chrono::duration<double> diff_time = end_time - start_time;
    out_guard.lock();
    std::cout << "The execution took: " << diff_time.count() << " seconds. \n";
    out_guard.unlock();

    for(int i = 0; i < NUM_THREADS; i  ){
        out_guard.lock();
        std::cout << "ID: " << i << ": sin: " << sin_out[i] << " cos: " << cos_out[i] << "\n";
        out_guard.unlock();
    }
    return 0;
}

Solution: Replace [&] with [&, i=i, sample_per_thread=sample_per_thread] s.t. only things that need to be passed by reference are passed by reference.

CodePudding user response:

Threads.emplace_back([&](){cos_sin_multiplication(i, sample_per_thread, theta, value[0], sin_out[i], cos_out[i]);});
    }

C gives you no guarantees, whatsoever, exactly when the execution thread will actually start executing this closure. The only thing you can rely on is that this will happen at some point after the new std::thread object gets constructed (as part of the emplace). Which is nowhere near what must happen in order for this to work correctly. The only situation where everything works correctly would be if the execution thread begins executing the closure, and evaluates all of the parameters to the function call before the parent execution thread iterates the for loop, immediately afterwards. The chances of that are not very good.

So, in addition to everything else that goes wrong sample_per_thread will be whatever was the last value calculated for it, as well.

It is entirely possible that all of your execution threads will finally end up executing this closure, and evaluating all of the parameters, which were captured by reference, after the for loop has finished, and i has been destroyed, making everything undefined behavior.

Even if some of the execution threads managed to wake up and smell the coffee a little bit earlier, you still have no guarantees, whatsoever, that sample_per_thread would be what was calculated for it just before its std::thread object was constructed. This is, actually, pretty much a guarantee that at least some of the execution threads will obtain the captured-by-reference value of sample_per_thread after it was already calculated for the next execution thread's ostensible consumption.

In other words, nothing here works correctly because everything gets captured by reference.

  • Related