I am new to parallel programming. I have been playing around with multi-threading and for some reason, multi-threading the Mandelbrot set is slower than running a single thread. I have been trying to figure it out for hours. Are there any improvements that you could suggest or perhaps something is wrong with some parts?
Mandelbrot.cpp:
// Write the image to a TGA file with the given name.
void Mandelbrot::write_tga(const char* filename)
{
ofstream outfile(filename, ofstream::binary);
uint8_t header[18] = {
0, // no image ID
0, // no colour map
2, // uncompressed 24-bit image
0, 0, 0, 0, 0, // empty colour map specification
0, 0, // X origin
0, 0, // Y origin
WIDTH & 0xFF, (WIDTH >> 8) & 0xFF, // width
HEIGHT & 0xFF, (HEIGHT >> 8) & 0xFF, // height
24, // bits per pixel
0, // image descriptor
};
outfile.write((const char*)header, 18);
for (int y = 0; y < HEIGHT; y)
{
for (int x = 0; x < WIDTH; x)
{
uint8_t pixel[3] = {
image[y][x] & 0xFF, // blue channel
(image[y][x] >> 8) & 0xFF, // green channel
(image[y][x] >> 16) & 0xFF, // red channel
};
outfile.write((const char*)pixel, 3);
}
}
outfile.close();
if (!outfile)
{
// An error has occurred at some point since we opened the file.
cout << "Error writing to " << filename << endl;
exit(1);
}
}
void Mandelbrot::printProgress() {
// *** CONSUMER *** //
while (progressPercentage <= 100.00)
{
//std::cout << "\nAquiring lock in PrintProgress\n";
unique_lock<mutex> lock(progressMutex);
while (!progressReady)
{
progressConditionVariable.wait(lock);
}
//std::cout << "\nlock aquired\n";
progressReady = false;
cout << progressPercentage << endl;
}
}
void Mandelbrot::calcProgress() {
// *** PRODUCER *** //
//shared progress variable
//protect with a mutex
//print_func() waits to be signaled by worker threads and outputs
//condition variables
unique_lock<mutex> lock(progressMutex);
progressCount = 1;
//double ThreadNum = numOfThreads;
progressPercentage = (double)progressCount / (double)numOfThreads;
progressReady = true;
lock.unlock();
// progress condition variable
progressConditionVariable.notify_one();
}
// Render the Mandelbrot set into the image array.
// The parameters specify the region on the complex plane to plot.
void Mandelbrot::compute_mandelbrot(double left, double right, double top, double bottom, int startNum, int endNum, int y)
{
// The number of times to iterate before we assume that a point isn't in the
// Mandelbrot set.
// (You may need to turn this up if you zoom further into the set.)
const int MAX_ITERATIONS = 500;
for (int x = 0; x < WIDTH ; x)
{
// Work out the point in the complex plane that
// corresponds to this pixel in the output image.
complex<double> c(left (x * (right - left) / WIDTH),
top (y * (bottom - top) / HEIGHT));
// Start off z at (0, 0).
complex<double> z(0.0, 0.0);
// Iterate z = z^2 c until z moves more than 2 units
// away from (0, 0), or we've iterated too many times.
int iterations = 0;
while (abs(z) < 2.0 && iterations < MAX_ITERATIONS)
{
z = (z * z) c;
iterations;
}
if (iterations == MAX_ITERATIONS)
{
// z didn't escape from the circle.
// This point is in the Mandelbrot set.
image[y][x] = 0xFCC2E6; // darker pink
}
else
{
// z escaped within less than MAX_ITERATIONS
// iterations. This point isn't in the set.
//image[y][x] = 0xFFEAF7; // pink
int i = iterations;
if (i < 5)
image[y][x] = 0xFFEAF7;
else if (i < 10)
image[y][x] = 0xF9BCDD;
else if (i < 40)
image[y][x] = 0xD5A4CF;
}
image[endNum - 1][x] = 0xFF0000;
}
// call progress function
calcProgress();
}
Mandelbrot.h:
#pragma once
#include <chrono>
#include <cstdint>
#include <cstdlib>
#include <complex>
#include <fstream>
#include <iostream>
#include <vector>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <atomic>
#include "Mandelbrot.h"
const int WIDTH = 1020; //ori 1920
const int HEIGHT = 600; // ori 1200
// Import things we need from the standard library
using std::chrono::duration_cast;
using std::chrono::milliseconds;
using std::complex;
using std::cout;
using std::endl;
using std::ofstream;
using std::thread;
using std::mutex;
using std::condition_variable;
using std::unique_lock;
class Mandelbrot
{
private:
public:
// initialise variables
mutex progressMutex;
condition_variable progressConditionVariable;
bool progressReady = false;
int progressCount = 0;
int numOfThreads = 10;
double progressPercentage = 0;
// The image data.
// The size of the image to generate.
uint32_t image[HEIGHT][WIDTH];
//functions
void write_tga(const char* filename);
void printProgress();
void calcProgress();
void compute_mandelbrot(double left, double right, double top, double bottom, int startNum, int endNum, int y);
};
source.cpp:
#include <chrono>
#include <cstdint>
#include <cstdlib>
#include <complex>
#include <fstream>
#include <iostream>
#include <vector>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <atomic>
#include "Mandelbrot.h"
// Import things we need from the standard library
using std::chrono::duration_cast;
using std::chrono::milliseconds;
using std::complex;
using std::cout;
using std::endl;
using std::ofstream;
using std::thread;
using std::mutex;
using std::condition_variable;
using std::unique_lock;
// Define the alias "the_clock" for the clock type we're going to use.
typedef std::chrono::steady_clock the_clock;
int main(int argc, char* argv[])
{
Mandelbrot* mandelbrot = new Mandelbrot;
// This shows the whole set.
int lastNum = 0;
int calcHeight = HEIGHT / mandelbrot->numOfThreads;
int End = calcHeight;
// Start timing
the_clock::time_point start = the_clock::now();
std::vector<thread> threads;
//std::thread Consume(mandelbrot->printProgress); //previous
std::thread Consumer([&] {
//lambda function
while (mandelbrot->progressPercentage <= 100.00) {
mandelbrot->printProgress();
}
});
for (int i = 0; i < mandelbrot->numOfThreads; i )
{
//std::thread newThread(mandelbrot->compute_mandelbrot(-2.0, 1.0, 1.125, -1.125, lastNum, End)); //previous
std::thread newThread([&] {
//lambda function
for (int y = lastNum; y < End; y)
{
//std::cout << "Before compute\n";
mandelbrot->compute_mandelbrot(-2.0, 1.0, 1.125, -1.125, lastNum, End, y);
}
lastNum = End;
End = End calcHeight;
});
//threads.push_back(thread(compute_mandelbrot, -2.0, 1.0, 1.125, -1.125, lastNum, End));
threads.push_back(std::move(newThread));
}
// Wait for threads to finish.
for (auto &t : threads) {
t.join();
}
// Stop timing
the_clock::time_point end = the_clock::now();
Consumer.join();
// Compute the difference between the two times in milliseconds
auto time_taken = duration_cast<milliseconds>(end - start).count();
cout << "Computing the Mandelbrot set took " << time_taken << " ms." << endl;
mandelbrot->write_tga("output.tga");
return 0;
}
CodePudding user response:
There is an error in the work distribution. Total iterations are not same when checked by a global counter:
std::atomic<int> totalIterations{0}; // this
class Mandelbrot
{
...
and putting the increment in the internal loop:
totalIterations ;
iterations;
then print the total at the end:
std::cout<<"total iterations = " << totalIterations.load()<<std::endl;
return 0;
}
- 1 thread:
total iterations = 72024658
--> 1 second - 2 threads:
total iterations = 107846383
- 3 threads:
total iterations = 143879939
- 7 threads:
total iterations = 283689344
--> 10 seconds
My guess is that those "extra" iterations are on same thread (possibly the first thread), causing it to compute more and more while all other threads getting mostly empty(1-2 iterations only) pixels for no speedup. As a rule of comparing Mandelbrot performance on different systems, the output image has to be identical. With more iterations between different settings, there has to be a different image output. Check if Mandelbrot surface is equally distributed to all threads or implement a dynamic load-balancing instead. Static load-balancing is not good on Mandelbrot-Set that has non-balanced workload per pixel.
If output images are identical, then possibly one of the threads are overwriting own results onto other threads' results. The work distribution is overlapping one or more threads' pixels with other threads' pixels. Overwriting/over-computing can also decrease performance (not to mention the data race).
You can try a very simple load-balancing:
std::atomic<int> workIndex{0};
// worker thread-1, worker thread-2, ...
std::thread thr([&](){
while(workIndex.load()<MAX_PIXELS)
{
// get last iteration position then add 1000, as a way of allocating so-called workload
auto myWork = workIndex.fetch_add(1000);
// start from myWork,
// compute 1000 pixels or just number of pixels remaining
if(myWork<MAX_PIXELS)
compute(myWork,std::min(1000, MAX_PIXELS-myWork);
// progressPercentage =
// ( 1000.0 / MAX_PIXELS ) * 100.00001
// mind the last digit for float ^
calcProgress();
}
});
no need to make work-balance calculations too complex for a problem that is already complex on the work amount per pixel.
CodePudding user response:
You can't base the y
loop on anything shared between all threads or the threads interfere with each other. Instead you have to compute the start and end of the lambda and pass i
as argument to the lambda:
std::thread newThread([mandelbrot, calcHeight](int i) {
//lambda function
int a = i * calcHeight, b = (i 1) * calcHeight;
for (int y = a; y < b; y)
{
//std::cout << "Before compute\n";
mandelbrot->compute_mandelbrot(-2.0, 1.0, 1.125, -1.125, a, b, $
}
}, i);
You also have to fix the progress computation in Mandelbrot.cpp:
progressPercentage = (double)progressCount / HEIGHT * 100 1;
The 1
is a hack to make the consumer actually finish because otherwise the code hangs at 100%, probably because it's something like 99.99999999%. Doubles are imprecise.