I am developing in Vulkan 1.0, building a rendering system by learning and implementing functionality one step at a time. I get the gist of command recording and submission, but I haven't been far enough to understand a use case in which I'd want to have multiple command buffers per pool. It was this presentation at slide 14 which raised some questions.
My understanding and current design is as follows:
- Optimally, there should be one command pool per frame per thread so command buffers aren't recording over the same memory while in flight. If I have 3 frames and each frame can have up to 4 recording threads, that's 12 command pools at a minimum.
- Command buffers are associated with a command pool at creation time and will be reset on the next frame. To potentially get better performance, the entire pool will be reset rather than the individual buffers.
- A single command pool may be used in the creation of multiple command buffers. This group of command buffers would all be used in the same frame and thread.
- According to this article under "Command overlap", the reordering of commands may happen between command buffers and
vkQueueSubmit
calls. So if I had a group of command buffers in the same frame and thread, I'd need something more than just submission order to guarantee the results I want. Maybe I'd use unique semaphores for each submission? - If I'm coding for a frame/thread, I see no advantage to submitting commands a few times from beginning to end as opposed to submitting everything at once in the end. It's the same amount of work in the same time span. It may even be detrimental to submit multiple times because of the
vkQueueSubmit
overhead mentioned in the specification.
From the assumptions above, in what cases would it be necessary or advantageous to have more than one command buffer per command pool as opposed to having one command buffer that records everything from beginning to end for the given frame and thread?
CodePudding user response:
having one command buffer that records everything from beginning to end for the given frame and thread?
Well, what happens if a thread needs to record things in an order other than the order in which they need to be submitted? That's kind of the point of a CB, isn't it? The ability to build commands in an order that is convenient, then submit them in the way that works out for the GPU.
For example, let's say you have a thread that is rendering a particular set of objects. To do that, you need to write their matrices and other per-object properties to a uniform buffer. And let's say that, for whatever reason, this particular Vulkan implementation doesn't allow you to use mappable memory directly for uniform buffers. So you have to write to mappable memory and copy the data to a uniform buffer via a memory transfer operation.
So the thread creating the commands for these meshes need to do two things. They need to build the commands to render the meshes, and they need to build the commands to transfer the uniform data to the buffer that the rendering commands will need.
Your way however requires that commands are put into the CB in the order you want them executed. So you would have to loop through the entire list of objects to build the transfer commands, and loop through it again to build the rendering commands. But you're reading the same objects each time through the loop. During the first loop, you had access to 100% of the data needed to issue the rendering command.
And the second time through the loop, all that data is no longer in the cache. So the second time has about the same number of cache misses (and therefore real memory accesses) as the first time.
That's bad.
Furthermore, rendering commands need to be placed within a render pass instance. Transfer commands cannot be in a render pass instance. But if you're putting transfer commands into the same CB as the rendering commands... that CB must begin and end the render pass instance.
So... how can other threads issue commands for that render pass instance?
If you want parallelism (and you do), then you need these threads to be creating secondary CBs for their rendering commands. A later task will collate them into the primary CB, and that CB will have the render pass instance. But secondary CBs built for a render pass cannot contain transfer commands.
So if you want parallelism, then any transfer commands that have to be generated alongside rendering commands must go into a different CB. One that will be submitted before the secondary CBs (or even submitted to a different queue altogether).