Multithreading is slowing down OpenGl loop-CodePudding

I'm currently programing a minecraft like map generator using OpenGL in C . I have a AMD Rayzen 5 3600 6-Core.

So, when I tried to add multithreading in my main loop to call my chunks generation, it was slowing down the rendering. I'm not using multithreading for rendering.

I'm on Windows using MinGw to compile code and multithreading works on posix.

The problem is I can't figure why it makes my rendering slow. Even if I try to create ONLY ONE thread I loose FPS. And even if I create a thread to do simple task like :

std::cout << "Thread #" << i << "\n";

It will slow down the rendering.

My compile flags are -pthread -lopengl32 -lglu32 -lgdi32 -luser32 -lkernel32 -lglfw3dll -O3

I'm adding the fact that at school I'm working on MacOS and the multithreading is not slowing down the rendering. I'm assuming a MinGW problem.

If you have any kind of idea to help me I will take it. Thank you for your forthcoming responses !

There is my loop :

while (!glfwWindowShouldClose(window))
    {
        Frustum frustum;
        //fps(window);
        float currentFrame = glfwGetTime();
        deltaTime = currentFrame - lastFrame;
        lastFrame = currentFrame;

        processInput(window);

        glClearColor(0.69f, 0.94f, 0.95f, 1.0f);
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

        shader.use();
        glm::mat4 projection = glm::perspective(glm::radians(camera.Zoom), (float)SCR_WIDTH / (float)SCR_HEIGHT, 0.1f, 1000.0f);
        shader.setMat4("projection", projection);
        glm::mat4 view = camera.GetViewMatrix();
        shader.setMat4("view", view);
        frustum.Transform(projection, view);
        shader.setVec3("lightPos", glm::vec3(0.7, 0.2, 0.5));
        shader.setVec3("viewPos", camera.Position);
        displayChunk(shader, vox, &chunks, frustum);

        glDepthFunc(GL_LEQUAL); // change depth function so depth test passes when values are equal to depth buffer's content
        skyboxShader.use();
        view = glm::mat4(glm::mat3(camera.GetViewMatrix())); // remove translation from the view matrix
        skyboxShader.setMat4("view", view);
        skyboxShader.setMat4("projection", projection);
        // skybox cube
        glBindVertexArray(skybox.skyboxVAO);
        glActiveTexture(GL_TEXTURE0);
        glBindTexture(GL_TEXTURE_CUBE_MAP, skybox.cubemapTexture);
        glDrawArrays(GL_TRIANGLES, 0, 36);
        glBindVertexArray(0);
        glDepthFunc(GL_LESS); // set depth function back to default

        glfwSwapBuffers(window);
        glfwPollEvents();
        
        for (unsigned i = 0; i < 1;   i)
        {
            threads[i] = std::thread([&mtx, i]
                                     {
                                         {
                                             // Use a lexical scope and lock_guard to safely lock the mutex only for
                                             // the duration of std::cout usage.
                                             std::lock_guard<std::mutex> iolock(mtx);
                                             std::cout << "Thread #" << i << " is running\n";
                                         }
                                     });
        }

        for (auto &t : threads)
        {
            t.join();
        }
    }

CodePudding user response：

Threads and processes are basically the same thing under Linux, both are created by calling clone() internally. So you can see that a thread is not something cheap to create and you are doing it several times inside each loop!

Don't beat yourself for that, the first generations of web servers (Apache) were like that too, for each connection they spawned a process or thread. With time they realized that just the creation of the process/thread was where the majority of the time was being spent. Creating a process or thread can take several milliseconds.

The next evolution came with thread pools and that's what I suggest you do. What you should do is to create all the threads upfront and lock them with a mutex and condition variable. When you have work for them to do, you push the data into their queue or object and fire the condition variable, which will unlock the mutex and release the thread. This will typically cost you just 3-10 microseconds, which is a thousand times better than what you have right now.

You can write your own thread pool as in this tutorial or ou can use a predefined pool as the one provided by boost::thread_pool.

If you even find that 3-10 microseconds is too much, you can have your threads spinning at 100% cpu and use a lock free container like the boost spsc container to communicate. The latency between threads in this case drops to a few dozen nanoseconds. The downside of this approach is that your threads will always be consuming 100% of one core even when doing nothing.

CodePudding user response：

Thank you for your answer ! I asked a friend of mine and he had the same conclusion than you.

I will check the resources you linked, I really appreciate it !