Questions about the results do not tally with the expected mulps instructions-CodePudding

Operating sequence as shown above, after xmm0 and xmm1 multiplication, xmm0 value is 0, I ask why is this

CodePudding user response:

What are you looking forward to a result, mulps is parallel by single precision floating point Numbers, is not the integer, integer should use PMULDQ/VPMULDQ

CodePudding user response:

reference 1/f, play big shoot early nuclear response:

what are you looking forward to a result, mulps is parallel by single precision floating point Numbers, is not the integer, integer should use PMULDQ/VPMULDQ

So, thank you, I saw the Intel manuals, PMULQD this instruction is SSE4.1, results according to the four words save; Want to ask is there a low version of the instruction, according to the double word multiplication and saved, I think this operation should be quite common, but looking for a long time didn't find the instructions

CodePudding user response:

If you require SSE2, can use PMULUDQ, but this is an unsigned 32-bit integer multiplication, or to convert an integer to double-precision floating-point (don't converted to single precision, the results may have error, unless you can accept), CVTDQ2PD, then MULPD, CVTSD2SI again (this time can only convert a result, need to shift to another) during the transition, do not use CVTPD2DQ, this can only put the results into a 32-bit integer, but the product of two 32-bit integer may be 64 digits