CodePudding
Home
front end
Back-end
Net
Software design
Enterprise
Blockchain
Mobile
Software engineering
database
OS
other
Tags
>
sse
09-04
database
sse4 packed sum between int32_t and int16_t (sign extend to int32_t)
08-24
database
Efficient transpose of 2D nibble matrix?
07-05
OS
On uint64 to double conversion: Why is the code simpler after a shift right by 1?
07-03
Net
_mm_loadu_si32 not recognized by GCC on Ubuntu
06-26
Blockchain
In JWASM/MASM - pshufw produces Error A2030: Instruction or register not accepted in current CPU mod
06-25
Blockchain
SSE interleave/merge/combine 2 vectors using a mask, per-element conditional move?
06-13
Enterprise
What is the difference between PSHUFD SHUFPD
06-11
front end
Why does MOVD/MOVQ between GP and SIMD registers have quite high latency?
06-08
Enterprise
Is there a better way to any detect bits that are set in a 16-byte array of flags?
06-02
Net
Does RSQRTSS break the dependency on the destination register?
05-04
Software engineering
SIMD - how to add corresponding values from 2 vectors of different element widths (char or uint8_t a
04-12
Enterprise
count number of unique values in a 128bit avx vector, or detecting if all elements are equal?
04-12
Software engineering
count number of unique values in a 128bit avx vector
03-14
Software engineering
Replace `movss xmm0, cs:dword_5B27420` with `movss xmm0, immediate`
03-10
OS
Replace movss xmm0, cs:dword_5B27420 with immediate value
03-04
Software engineering
What instruction set does SFENCE belong to?
03-03
Back-end
Fast CRC with PCLMULQDQ *NOT* reflected
02-22
Enterprise
How to use SSE instruction set on C6678 DSP?
11-28
Software design
Set XMM register via address location for X86-64
11-20
Enterprise
Efficiently shift-or large bit vector
09-30
Mobile
int8 x uint8 matrix-vector product with column-major layout
09-28
Net
Is the "throughput" listed by Intel per thread or per core?
09-18
Software design
fast multiplication of int8 arrays by scalars
Links:
CodePudding