avx-CodePudding

Tags > avx

09-05MobileQuickest way to shift/rotate byte vector with SIMD
08-28front endDoes anyone have an example where _mm256_stream_load_si256 (non-tempral load to bypasse cache) actua
06-04front endHow to set all the values in AVX ymm register to be the same (all are 0/1/specific value)?
05-30EnterpriseBest way to mask a single bit in AVX2?
05-01databaseImplementing matrix operation using AVX in C
04-12Enterprisecount number of unique values in a 128bit avx vector, or detecting if all elements are equal?
03-17Software engineering__attribute__ ((vector_size)) magic being destroyed by union
02-23Back-endAVX performance slower for bitwise xor op and popcount
12-24Software designAVX2: CountTrailingZeros on 8 bit elements in AVX register
12-21BlockchainGEMM kernel implemented using AVX2 is faster than AVX2/FMA on a Zen 2 CPU
12-21OSGEMM kernel implemented using AVX2 is faster than AVX2/FMA on a Zen 2 CPU
12-08BlockchainHow to load into __m256 from a float* but reading backwards in memory as opposed to forwards?
11-20EnterpriseEfficiently shift-or large bit vector
11-20front endHow is the lvalue problem solved for SIMD inline asm with memory output operands in a 2D array?
11-17otherIn assembly, how to add integers without destroying either operand?
10-08EnterpriseWhen source registers in avx instruction can be reused
09-30Mobileint8 x uint8 matrix-vector product with column-major layout

Links：
CodePudding