Home > OS >  Multi-threading slower than single thread
Multi-threading slower than single thread

Time:05-13

I am new to parallel programming. I have been playing around with multi-threading and for some reason, multi-threading the Mandelbrot set is slower than running a single thread. I have been trying to figure it out for hours. Are there any improvements that you could suggest or perhaps something is wrong with some parts?

Mandelbrot.cpp:


// Write the image to a TGA file with the given name.
void Mandelbrot::write_tga(const char* filename)
{

    ofstream outfile(filename, ofstream::binary);

    uint8_t header[18] = {
        0, // no image ID
        0, // no colour map
        2, // uncompressed 24-bit image
        0, 0, 0, 0, 0, // empty colour map specification
        0, 0, // X origin
        0, 0, // Y origin
        WIDTH & 0xFF, (WIDTH >> 8) & 0xFF, // width
        HEIGHT & 0xFF, (HEIGHT >> 8) & 0xFF, // height
        24, // bits per pixel
        0, // image descriptor
    };
    outfile.write((const char*)header, 18);

    for (int y = 0; y < HEIGHT;   y)
    {
        for (int x = 0; x < WIDTH;   x)
        {
            uint8_t pixel[3] = {
                image[y][x] & 0xFF, // blue channel
                (image[y][x] >> 8) & 0xFF, // green channel
                (image[y][x] >> 16) & 0xFF, // red channel
            };
            outfile.write((const char*)pixel, 3);
        }
    }

    outfile.close();
    if (!outfile)
    {
        // An error has occurred at some point since we opened the file.
        cout << "Error writing to " << filename << endl;
        exit(1);
    }
}


void Mandelbrot::printProgress() {
    // *** CONSUMER *** //

    while (progressPercentage <= 100.00)
    {
        //std::cout << "\nAquiring lock in PrintProgress\n";
        unique_lock<mutex> lock(progressMutex);

        while (!progressReady)
        {
            progressConditionVariable.wait(lock);
        }
        //std::cout << "\nlock aquired\n";
        

        progressReady = false;

        cout << progressPercentage << endl;
        
    }
    
}

void Mandelbrot::calcProgress() {
    // *** PRODUCER *** // 

    //shared progress variable
    //protect with a mutex
    //print_func() waits to be signaled by worker threads and outputs
    //condition variables

    unique_lock<mutex> lock(progressMutex);
    progressCount  = 1;

    //double ThreadNum = numOfThreads;
    progressPercentage = (double)progressCount / (double)numOfThreads;
    progressReady = true;
    lock.unlock();

    // progress condition variable
    progressConditionVariable.notify_one();

}

// Render the Mandelbrot set into the image array.
// The parameters specify the region on the complex plane to plot.
void Mandelbrot::compute_mandelbrot(double left, double right, double top, double bottom, int startNum, int endNum, int y)
{

    // The number of times to iterate before we assume that a point isn't in the
    // Mandelbrot set.
    // (You may need to turn this up if you zoom further into the set.)
    const int MAX_ITERATIONS = 500;

    for (int x = 0; x < WIDTH ;   x)
    {

        // Work out the point in the complex plane that
        // corresponds to this pixel in the output image.
        complex<double> c(left   (x * (right - left) / WIDTH),
            top   (y * (bottom - top) / HEIGHT));

        // Start off z at (0, 0).
        complex<double> z(0.0, 0.0);

        // Iterate z = z^2   c until z moves more than 2 units
        // away from (0, 0), or we've iterated too many times.
        int iterations = 0;
        while (abs(z) < 2.0 && iterations < MAX_ITERATIONS)
        {
            z = (z * z)   c;
              iterations;
        }

        if (iterations == MAX_ITERATIONS)
        {
            // z didn't escape from the circle.
            // This point is in the Mandelbrot set.
            
            image[y][x] = 0xFCC2E6; // darker pink
        }
        else
        {
            // z escaped within less than MAX_ITERATIONS
            // iterations. This point isn't in the set.
            //image[y][x] = 0xFFEAF7; // pink

            int i = iterations;
            if (i < 5)
                image[y][x] = 0xFFEAF7;
            else if (i < 10)
                image[y][x] = 0xF9BCDD;
            else if (i < 40)
                image[y][x] = 0xD5A4CF;

        }

    image[endNum - 1][x] = 0xFF0000;

    }
    // call progress function 
    calcProgress();
}

Mandelbrot.h:

#pragma once

#include <chrono>
#include <cstdint>
#include <cstdlib>
#include <complex>
#include <fstream>
#include <iostream>
#include <vector>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <atomic>

#include "Mandelbrot.h"

const int WIDTH = 1020;  //ori 1920 
const int HEIGHT = 600; // ori 1200

// Import things we need from the standard library
using std::chrono::duration_cast;
using std::chrono::milliseconds;
using std::complex;
using std::cout;
using std::endl;
using std::ofstream;
using std::thread;
using std::mutex;
using std::condition_variable;
using std::unique_lock;


class Mandelbrot
{
private:

public:
    // initialise variables 
    mutex progressMutex;
    condition_variable progressConditionVariable;
    bool progressReady = false;
    int progressCount = 0;
    int numOfThreads = 10;
    double progressPercentage = 0;

    // The image data.
    // The size of the image to generate.
    uint32_t image[HEIGHT][WIDTH];


    //functions
    void write_tga(const char* filename);
    void printProgress();
    void calcProgress();
    void compute_mandelbrot(double left, double right, double top, double bottom, int startNum, int endNum, int y);

};

source.cpp:

#include <chrono>
#include <cstdint>
#include <cstdlib>
#include <complex>
#include <fstream>
#include <iostream>
#include <vector>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <atomic>

#include "Mandelbrot.h"

// Import things we need from the standard library
using std::chrono::duration_cast;
using std::chrono::milliseconds;
using std::complex;
using std::cout;
using std::endl;
using std::ofstream;
using std::thread;
using std::mutex;
using std::condition_variable;
using std::unique_lock;


// Define the alias "the_clock" for the clock type we're going to use.
typedef std::chrono::steady_clock the_clock;


int main(int argc, char* argv[])
{
    Mandelbrot* mandelbrot = new Mandelbrot;


    // This shows the whole set.
    int lastNum = 0;
    int calcHeight = HEIGHT / mandelbrot->numOfThreads;
    int End = calcHeight;

    // Start timing
    the_clock::time_point start = the_clock::now();

    std::vector<thread> threads;
    

    //std::thread Consume(mandelbrot->printProgress); //previous
    std::thread Consumer([&] {
        //lambda function

        while (mandelbrot->progressPercentage <= 100.00) {
            mandelbrot->printProgress();
        }

    });

    

    for (int i = 0; i < mandelbrot->numOfThreads; i  )
    {
        //std::thread newThread(mandelbrot->compute_mandelbrot(-2.0, 1.0, 1.125, -1.125, lastNum, End)); //previous
        std::thread newThread([&] {
            //lambda function

            for (int y = lastNum; y < End;   y)
            {
                //std::cout << "Before compute\n";
                mandelbrot->compute_mandelbrot(-2.0, 1.0, 1.125, -1.125, lastNum, End, y);
            }
            lastNum = End;
            End = End   calcHeight;
        });

        //threads.push_back(thread(compute_mandelbrot, -2.0, 1.0, 1.125, -1.125, lastNum, End));
        threads.push_back(std::move(newThread));
    
    }

    // Wait for threads to finish.
    for (auto &t : threads) {
        t.join();
    }

    // Stop timing
    the_clock::time_point end = the_clock::now();


    Consumer.join();


    // Compute the difference between the two times in milliseconds
    auto time_taken = duration_cast<milliseconds>(end - start).count();
    cout << "Computing the Mandelbrot set took " << time_taken << " ms." << endl;

    mandelbrot->write_tga("output.tga");

    return 0;
}

CodePudding user response:

There is an error in the work distribution. Total iterations are not same when checked by a global counter:

std::atomic<int> totalIterations{0}; // this

class Mandelbrot
{
   ...

and putting the increment in the internal loop:

        totalIterations  ;
          iterations;

then print the total at the end:

    std::cout<<"total iterations = " << totalIterations.load()<<std::endl;
    return 0;
}
  • 1 thread: total iterations = 72024658 --> 1 second
  • 2 threads: total iterations = 107846383
  • 3 threads: total iterations = 143879939
  • 7 threads: total iterations = 283689344 --> 10 seconds

My guess is that those "extra" iterations are on same thread (possibly the first thread), causing it to compute more and more while all other threads getting mostly empty(1-2 iterations only) pixels for no speedup. As a rule of comparing Mandelbrot performance on different systems, the output image has to be identical. With more iterations between different settings, there has to be a different image output. Check if Mandelbrot surface is equally distributed to all threads or implement a dynamic load-balancing instead. Static load-balancing is not good on Mandelbrot-Set that has non-balanced workload per pixel.

If output images are identical, then possibly one of the threads are overwriting own results onto other threads' results. The work distribution is overlapping one or more threads' pixels with other threads' pixels. Overwriting/over-computing can also decrease performance (not to mention the data race).

You can try a very simple load-balancing:

std::atomic<int> workIndex{0};


// worker thread-1, worker thread-2, ... 
std::thread thr([&](){
    while(workIndex.load()<MAX_PIXELS)
    {
        // get last iteration position then add 1000, as a way of allocating so-called workload
        auto myWork = workIndex.fetch_add(1000);

        // start from myWork, 
        // compute 1000 pixels or just number of pixels remaining
        if(myWork<MAX_PIXELS)
            compute(myWork,std::min(1000, MAX_PIXELS-myWork);

        // progressPercentage  =
        //   ( 1000.0 / MAX_PIXELS ) * 100.00001
        //       mind the last digit for float ^
        calcProgress();
    }
});

no need to make work-balance calculations too complex for a problem that is already complex on the work amount per pixel.

CodePudding user response:

You can't base the y loop on anything shared between all threads or the threads interfere with each other. Instead you have to compute the start and end of the lambda and pass i as argument to the lambda:

    std::thread newThread([mandelbrot, calcHeight](int i) {
        //lambda function
        int a = i * calcHeight, b = (i   1) * calcHeight;
        for (int y = a; y < b;   y)
        {
            //std::cout << "Before compute\n";
            mandelbrot->compute_mandelbrot(-2.0, 1.0, 1.125, -1.125, a, b, $
        }
    }, i);

You also have to fix the progress computation in Mandelbrot.cpp:

progressPercentage = (double)progressCount / HEIGHT * 100   1;

The 1 is a hack to make the consumer actually finish because otherwise the code hangs at 100%, probably because it's something like 99.99999999%. Doubles are imprecise.

  • Related