When executing device query, I want to know the difference between "Maximum number of threads per multiprocessor" and "Maximum number of threads per block". As I understood it, sm = multiprocessor = block on the gpu, but I do not understand why the two values are different. Are there multiple blocks in a multiprocessor?
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
And an additional question is the relationship between thread and core, is it correct to match thread = core?
CodePudding user response:
Are there multiple blocks in a multiprocessor?
Yes, there can be.
quite simply, sm == multiprocessor. sm != block
A SM (multiprocessor) is a hardware entity. A threadblock is a software entity, basically a collection of threads.
A SM or multiprocessor can have more than 1 block resident. To get full occupancy of an SM that had 1536 max threads, you would need to have something like three 512-thread blocks resident.
And an additional question is the relationship between thread and core, is it correct to match thread = core?
A thread represents a sequence of instructions. A "core" in GPU speak is a functional unit in the SM which processes certain instruction types, namely 32-bit floating point add, multiply, and multiply-add instructions. Other instruction types are handled by other (kinds of) functional units in the SM.
A thread will require a core when it has one of those 32-bit floating point instruction types to process. If it happens to have a different instruction to process, say a LD (load) instruction, it will require a different functional unit, specifically, a LD/ST (load/store) unit in that case/example.