I am interested in processing a 2d image pixel-wise and need the fastest way to read pixel values from the image on metal as a texture2d. Is it faster to sample from the texture or directly read from it. Reading would require coordinate conversion from float2 to uint2 but if it doesn't need interpolation, that's certainly preferable.
Which is faster, sample or read? Also, what's the best sampler to use in this context?
Thanks a lot!
CodePudding user response:
It depends on CPU <- memory -> GPU type of the device. When texture being processed by a compute kernel is divided into threadgroups and each threadgroup is composed of individual threads. Each thread processes a single pixel. The threads in a threadgroup are further organized into single-instruction, multiple-data (SIMD) groups, also known as warps or wavefronts, that execute concurrently.
Which is faster, sample or read?
In most cases read is faster.
CodePudding user response:
Sampling a texture, especially if it has mipmaps, can lead to a number of it’s texels read from memory, so it sounds like read should be faster. But at the same time, if your sampling locations are coherent, you will be hitting the same texels, which will be cached. Sampling also involves some averaging that needs to be done. But, the best way to answer the question which is faster is to measure. You can try to use Metal System Trace or Xcode GPU Debugger for that.
If you are processing texture pixel by pixel, I would suggest reading it instead of sampling for correctness purposes.
If you are on a TBDR device you can also use tile dispatches to run a thread per each pixel of an attachment