what is the difference between 'Maximum number of threads per multiprocessor' and 'Ma-CodePudding

When executing device query, I want to know the difference between "Maximum number of threads per multiprocessor" and "Maximum number of threads per block". As I understood it, sm = multiprocessor = block on the gpu, but I do not understand why the two values are different. Are there multiple blocks in a multiprocessor?

  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024

And an additional question is the relationship between thread and core, is it correct to match thread = core?

CodePudding user response：

Are there multiple blocks in a multiprocessor?

Yes, there can be.

quite simply, sm == multiprocessor. sm != block

A SM (multiprocessor) is a hardware entity. A threadblock is a software entity, basically a collection of threads.

A SM or multiprocessor can have more than 1 block resident. To get full occupancy of an SM that had 1536 max threads, you would need to have something like three 512-thread blocks resident.

And an additional question is the relationship between thread and core, is it correct to match thread = core?

A thread represents a sequence of instructions. A "core" in GPU speak is a functional unit in the SM which processes certain instruction types, namely 32-bit floating point add, multiply, and multiply-add instructions. Other instruction types are handled by other (kinds of) functional units in the SM.

A thread will require a core when it has one of those 32-bit floating point instruction types to process. If it happens to have a different instruction to process, say a LD (load) instruction, it will require a different functional unit, specifically, a LD/ST (load/store) unit in that case/example.