I'm new to assembly code and SSE/AVX instructions. Now, I want to assign a specific value to all locations in 256-bit YMM registers, but I don't know if the final result is correct.
- To assign 0 or 1 to
ymm0
:
__asm__ __volatile__(
"vpxor %%ymm0, %%ymm0, %%ymm0\n\t" // all are 0
or
"VPCMPEQB %%ymm0, %%ymm0, %%ymm0\n\t" // all are 1
: : :);
GDB result shows that:
// all are 0
ymm0
{v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0x0 <repeats 32 times>},
v16_int16 = {0x0 <repeats 16 times>},
v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0x0, 0x0, 0x0, 0x0},
v2_int128 = {0x0, 0x0}}
// all are 1
ymm0
{v8_float = {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff},
v4_double = {0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff},
v32_int8 = {0xff <repeats 32 times>},
v16_int16 = {0xffff <repeats 16 times>},
v8_int32 = {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff},
v4_int64 = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff},
v2_int128 = {0xffffffffffffffffffffffffffffffff, 0xffffffffffffffffffffffffffffffff}}
- To set 0xA to all locations (both high and low 128-bits) in
ymm0
:
__asm__ __volatile__(
"movq $0xaaaaaaaaaaaaaaaa, %%rcx\n"
"vmovq %%rcx, %%xmm0\n"
"vpbroadcastq %%xmm0, %%ymm0\n": : :);
GDB result shows that:
ymm0
{v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0xaa <repeats 32 times>},
v16_int16 = {0xaaaa <repeats 16 times>},
v8_int32 = {0xaaaaaaaa, 0xaaaaaaaa, 0xaaaaaaaa, 0xaaaaaaaa, 0xaaaaaaaa, 0xaaaaaaaa, 0xaaaaaaaa, 0xaaaaaaaa},
v4_int64 = {0xaaaaaaaaaaaaaaaa, 0xaaaaaaaaaaaaaaaa, 0xaaaaaaaaaaaaaaaa, 0xaaaaaaaaaaaaaaaa},
v2_int128 = {0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}}
Questions:
- What does the GDB result (structure) mean? E.g., v8_float, v4_double, v32_int8, etc.
- In the second case (0xA), why are the v8_float and v4_double always 0?
- How can I assign the value (e.g., 'a') to all locations in YMM (including both high and low 128-bits)?
P.S VPBROADCAST — Load Integer and Broadcast
CodePudding user response:
First of all, your inline asm is broken: missing a "%ymm0"
clobber to tell the compiler you wrote that register. You even used asm("" : : :)
Extended asm syntax to explicitly tell the compiler there were no clobbers. Or better, https://gcc.gnu.org/wiki/DontUseInlineAsm - write a separate function, or use intrinsics and look at compiler-generated asm.
v8_float
means to interpret the 256 bits as a Vector of 8x float
. i.e. __m256
in Intel Intrinsics.
v32_int8
is a vector of 32x int8_t
, printing each byte separately. You can use p /x $ymm0.v8_int32
if that's how you want to look at it.
(2) Integer 0xa
is the bit-pattern for a very tiny subnormal float or double. Try putting that in as the "Hexadecimal Representation" on https://www.h-schmidt.net/FloatConverter/IEEE754.html.
(3) You already did broadcast 0xa
to all 64 nibbles in your 32-byte YMM register, as you can see from the v2_int128 = {0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}}
output showing both halves being all 0xaa
bytes.
If you actually wanted _mm256_set1_epi8(0x0a)
(broadcast that to every byte), you should have written 0x0a0a0a0a
instead of 0xaaaaaaaa
. (There's no need to use a qword immediate; vpbroadcastd
runs just as fast, but mov $0x0a0a0a0a,