Home > database >  OpenGL: Batch Renderer: Should Transformations Take place on the CPU or GPU?
OpenGL: Batch Renderer: Should Transformations Take place on the CPU or GPU?

Time:11-24

I am developing a 2D game engine that will support 3D in the future. In this current phase of development, I am working on the batch renderer. As some of you may know, when batching graphics together, uniform support for color (RGBA), texture coordinates, texture ID (texture index), and model transformation matrix go out the window, but instead are passed through the vertex buffer. Right now, I have implemented passing the model's positions, color, texture coordinates, and the texture ID to the vertex buffer. My vertex buffer format looks like this right now:

float* v0 = {x, y, r, g, b, a, u, v, textureID};
float* v1 = {x, y, r, g, b, a, u, v, textureID};
float* v2 = {x, y, r, g, b, a, u, v, textureID};
float* v3 = {x, y, r, g, b, a, u, v, textureID};

I am about to integrate calculating where the object should be in world space using a transformation matrix. This leads me to ask the question:

Should the transformation matrix be multiplied by the model vertex positions on the CPU or GPU?

Something to keep in mind is that if I pass it to the vertex buffer, I would have to upload the transformation matrix once per vertex (4 times per sprite) which to me seems like a waste of memory. On the other hand, multiplying the model vertex positions by the transformation matrix on the CPU seems like it would be slower compared with the GPU's concurrency capabilities.

This is how my vertex buffer format would look like if I calculate the transform on the GPU:

float* v0 = {x, y, r, g, b, a, u, v, textureID, m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14, m15};
float* v1 = {x, y, r, g, b, a, u, v, textureID, m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14, m15};
float* v2 = {x, y, r, g, b, a, u, v, textureID, m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14, m15};
float* v3 = {x, y, r, g, b, a, u, v, textureID, m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14, m15};

The question is mostly theoretically driven. So, a theoretical and technical answer would be much appreciated. But for reference, here is the code.

CodePudding user response:

Should Transformations Take place on the CPU or GPU?

It really depends on the situation at hand. If you resubmit your vertices every frame, it's best to benchmark what's best for your case. If you want to animate without resubmitting all your vertices, you don't have a choice but to apply it on the GPU.

Whatever the reason, if you decide to apply the transformations on the GPU, there are better ways of doing that other than duplicating the matrix for each vertex. I'd instead put the transformation matrices in an SSBO:

layout(std430, binding=0) buffer Models {
    mat4 MV[]; // model-view matrices
};

and store a single index in each vertex in the VAO:

float* v0 = {x, y, r, g, b, a, u, v, textureID, model};
float* v1 = {x, y, r, g, b, a, u, v, textureID, model};
float* v2 = {x, y, r, g, b, a, u, v, textureID, model};
float* v3 = {x, y, r, g, b, a, u, v, textureID, model};

The vertex shader can go and fetch the full matrix based on the index attribute:

layout(location = 0) in vec4 in_pos;
layout(location = 1) in int in_model;
void main() {
    gl_Position = MV[in_model] * in_pos;
}

You can even combine it with other per-object attributes, like the textureID.

EDIT: you can achieve something similar with instancing and multi-draw. Though it's likely to be slower.

CodePudding user response:

I'm not sure how your engine code actually looks like, but I assume it looks like any other OpenGL program.

If so, in my experience, the transform matrix should usually be passed to the vertex shader and be applied with the given vertex information on GPU when you draw the scene. For example:

//MVP matrix
GLuint MatrixID = glGetUniformLocation(shaderProgID, "MVP");
glUniformMatrix4fv(MatrixID, 1, GL_FALSE, &mvp[0][0]);

But if you want to find the world coordinates for all of the vertices for a specific group, outside the rendering function, you probably need to do it on CPU, or you will need to use some parallel programming techniques such as OpenCL to do the work on GPU.

The most important thing is, why specifically do you want the world coordinates information outside the drawing procedure? If you simply want to find the model's world coordinates, you can simply set a center coordinate for each model you have in the scene and only track that single coordinate rather than the whole mesh group.

The vertex information should always be in model coordinates and stored in vertex buffer with no touch, unless you want to apply some modification on them.

  • Related