simd-CodePudding

Tags > simd

09-14EnterpriseSIMD Intrinsics difference between Vector<T>, advsimd and sse?
09-05MobileQuickest way to shift/rotate byte vector with SIMD
08-28front endDoes anyone have an example where _mm256_stream_load_si256 (non-tempral load to bypasse cache) actua
08-24databaseEfficient transpose of 2D nibble matrix?
07-26Software engineeringIs uops.info wrong about vinserti128?
07-16Software engineeringHow to find the matching element in a tiny array with unique elements as quickly as possible?
07-07BlockchainCannot make rustc use simd instructions for inclusive range loops
07-02Mobilecuda SIMD instruction for per-byte multiplication with unsigned saturation
06-25BlockchainSSE interleave/merge/combine 2 vectors using a mask, per-element conditional move?
06-11OSAVX2 - storing integers at arbitrary indices in an array
06-11front endWhy does MOVD/MOVQ between GP and SIMD registers have quite high latency?
06-08EnterpriseIs there a better way to any detect bits that are set in a 16-byte array of flags?
06-05OSIs vfmadd132pd slow on AMD Zen 3 architecture?
05-30EnterpriseBest way to mask a single bit in AVX2?
05-24EnterpriseMultiplication of complex numbers using AVX2 FMA3
05-24MobileMultiplication of complex numbers using AVX2
05-20databaseWhy does this code execute more slowly after strength-reducing multiplications to loop-carried addit
05-19databaseaarch64-gcc simd inline asm, result always 0
05-18BlockchainSudden jump in sin function frequency
05-14otherProgram in assembly x86
05-04Software designHow to create a left-packed vector of indices of the 0s in one SIMD vector?
05-04Software engineeringSIMD - how to add corresponding values from 2 vectors of different element widths (char or uint8_t a
05-02Back-endIs there a best way to deal with undefined behavior in bitwise conversion between floats and integer
05-01databaseImplementing matrix operation using AVX in C
04-19OSHow do I interpret the instruction `mov v2.2d[0],x14` in aarch64 assembly?
04-17Software engineeringCode duplication issue without define macro
04-12Enterprisecount number of unique values in a 128bit avx vector, or detecting if all elements are equal?
04-12Software engineeringcount number of unique values in a 128bit avx vector
04-07OSC Loop vectorization - counting matches of 7-byte records with masking
03-23NetIs there a fast way to convert a string of 8 ASCII decimal digits into a binary number?
03-22front endIs there a fast way to pack multiple digits into a number?
03-22OSfast bit-matrix (64x64) transpose algorithm using SIMD (ARM)
03-21Mobilefast bitwise 64x64 bit-matrix transpose algorithm using SIMD (ARM)
03-17Software engineering__attribute__ ((vector_size)) magic being destroyed by union
03-11databaseHow to interpret uops.info?
02-23Back-endAVX performance slower for bitwise xor op and popcount
02-22EnterpriseHow to use SSE instruction set on C6678 DSP?
12-31Software engineeringDiscrepancy in result of Intrinsics vs Naive Vector reduction
12-25Back-endFaster memcpy assuming readability and writability of bytes past source and destination buffers
12-24Software designAVX2: CountTrailingZeros on 8 bit elements in AVX register

54 1 2 Next Last