Home > front end >  AVX Search Array UB with zero input
AVX Search Array UB with zero input

Time:10-29

I'm trying to search an array with AVX:

 __attribute__((target("avx512bw"))) int search(int* nums, int numsSize, int target) {
   // align nums 
   int arr[16] __attribute__((aligned(512)));
   __builtin_memcpy(arr, nums, numsSize*sizeof(int));
   // build vectors
   const __m512i valueVec = _mm512_set1_epi32(target);
   const __m512i searchVec = _mm512_load_epi32(&arr[0]);    
   // compare
   const __mmask16 equalBits = _mm512_cmpeq_epi32_mask(searchVec, valueVec);
   return equalBits;
}

When I have a 0 in the input for nums, like [0,1,3,5,9,12], and target=0, I get wrong results that are close to high powers of 2: 33282, 33281, 2692.

Is this due to the undefined bits in searchVec? Like it matches on the first zero of the ones not filled because my input does not fill the vector completely?

Also is there a way to convert the equalBits bitmask, which is 1,2,4,8,16, to the vector's index of the matching value, like 1,2,3,4,5? I tried _tzcnt_u32( (unsigned int) equalBits) but it looks like it needs to be cast to a vector, unsigned int __X.

CodePudding user response:

Yes, you need to mask off the unused elements.

int search(int* nums, int numsSize, int target) {
   // mask unused values -- assumes numsSize is <= 15
   auto const mask = (1 << numsSize) - 1;
   // build vectors
   const __m512i valueVec = _mm512_set1_epi32(target);
   const __m512i searchVec = _mm512_maskz_loadu_epi32(mask, nums);
   // compare
   const __mmask16 equalBits = _mm512_mask_cmpeq_epi32_mask(mask, searchVec, valueVec);
   return equalBits;
}

You don't need to copy to an aligned temporary array; you can use the loadu (for "unaligned") intrinsics.


is there a way to convert the equalBits bitmask, which is 1,2,4,8,16, to the vector's index of the matching value, like 1,2,3,4,5?

If you have more than one match, easy way is to make a vector of indices, then compress it

auto const indices = _mm512_set_epi32(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
// Assuming exactly one match, store to an array or something otherwise:
int index;
_mm512_mask_compressstoreu_epi32(&index, k, indices);
// See also _mm512_maskz_compress_epi32 to return a zmm instead of storing to a ptr:
// int index = _mm_cvtsi128_si32(
//          _mm256_castsi256_si128(
//            _mm512_castsi512_si256(matchedIndices)));

If you're doing this in a loop, use _mm512_set1_epi32(16) and add that to indices in each iteration.

If you have exactly one match, then you're correct about tzcnt and just need to cast the mask to an int:

_tzcnt_u32(static_cast<uint32_t>(k))
  • Related