I'm experimenting with threads. My program is supposed to take a vector and sum it by breaking it down into different sections and creating a thread to sum each section. Currently, my vector has 5 * 10^8
elements, which should be easily handled by my pc. However, the creation of each thread (4 threads in my case) takes an insanely long time. I'm wondering why...?
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <algorithm>
#include <numeric>
#include <ctime>
std::mutex m;
int ans = 0;
void sumPart(const std::vector<int>& v, int a, int b){
std::lock_guard<std::mutex> guard(m);
ans = std::accumulate(v.begin() a, v.begin() b, 0);
}
void sum(const std::vector<int>& v){
//threadCount is 4 on my pc
int threadCount = std::max(2, (int)std::thread::hardware_concurrency()/2);
int sz = v.size()/threadCount;
std::vector<std::thread> threads;
for(int i = 0; i < threadCount; i ){
clock_t start = clock();
threads.push_back(std::thread(sumPart, v, sz*i, sz*(i 1)));
std::cout << "thread " << i 1 << " took " << (clock()-start)/(CLOCKS_PER_SEC/1000) << " ms to create" << std::endl;
}
for(std::thread& t : threads){
t.join();
}
//the leftovers
ans = std::accumulate(v.begin() (threadCount)*sz, v.end(), 0);
}
int main(){
const int N = 5e8;
std::vector<int> v(N);
for(int i = 0; i < N; i ){
v[i] = i;
}
sum(v);
std::cout << ans << std::endl;
}
Output:
thread 1 took 681 ms to create
thread 2 took 824 ms to create
thread 3 took 818 ms to create
thread 4 took 814 ms to create
1711656320
Also, if I decrease the number of elements in vector, the time it takes to create each thread decreases as well, which is weird... (Also I know I'm getting int overflow but that's besides the point)
CodePudding user response:
std::thread(sumPart, v, sz*i, sz*(i 1))
Arguments to thread functions are copied, as part of creating the execution thread.
Even though sumPart
takes it parameter by value v
gets internally copied. copying a vector with 500000000 values will take a little bit of time.
You can use std::ref
to effectively pass v
by reference to your thread function. Note, as it has been mentioned, your lock will single-thread all of your execution threads. However they'll be started very quickly.