sse-CodePudding

Tags > sse

09-04databasesse4 packed sum between int32_t and int16_t (sign extend to int32_t)
08-24databaseEfficient transpose of 2D nibble matrix?
07-05OSOn uint64 to double conversion: Why is the code simpler after a shift right by 1?
07-03Net_mm_loadu_si32 not recognized by GCC on Ubuntu
06-26BlockchainIn JWASM/MASM - pshufw produces Error A2030: Instruction or register not accepted in current CPU mod
06-25BlockchainSSE interleave/merge/combine 2 vectors using a mask, per-element conditional move?
06-13EnterpriseWhat is the difference between PSHUFD SHUFPD
06-11front endWhy does MOVD/MOVQ between GP and SIMD registers have quite high latency?
06-08EnterpriseIs there a better way to any detect bits that are set in a 16-byte array of flags?
06-02NetDoes RSQRTSS break the dependency on the destination register?
05-04Software engineeringSIMD - how to add corresponding values from 2 vectors of different element widths (char or uint8_t a
04-12Enterprisecount number of unique values in a 128bit avx vector, or detecting if all elements are equal?
04-12Software engineeringcount number of unique values in a 128bit avx vector
03-14Software engineeringReplace `movss xmm0, cs:dword_5B27420` with `movss xmm0, immediate`
03-10OSReplace movss xmm0, cs:dword_5B27420 with immediate value
03-04Software engineeringWhat instruction set does SFENCE belong to?
03-03Back-endFast CRC with PCLMULQDQ *NOT* reflected
02-22EnterpriseHow to use SSE instruction set on C6678 DSP?
11-28Software designSet XMM register via address location for X86-64
11-20EnterpriseEfficiently shift-or large bit vector
09-30Mobileint8 x uint8 matrix-vector product with column-major layout
09-28NetIs the "throughput" listed by Intel per thread or per core?
09-18Software designfast multiplication of int8 arrays by scalars