CodePudding
Home
front end
Back-end
Net
Software design
Enterprise
Blockchain
Mobile
Software engineering
database
OS
other
Tags
>
simd
09-14
Enterprise
SIMD Intrinsics difference between Vector<T>, advsimd and sse?
09-05
Mobile
Quickest way to shift/rotate byte vector with SIMD
08-28
front end
Does anyone have an example where _mm256_stream_load_si256 (non-tempral load to bypasse cache) actua
08-24
database
Efficient transpose of 2D nibble matrix?
07-26
Software engineering
Is uops.info wrong about vinserti128?
07-16
Software engineering
How to find the matching element in a tiny array with unique elements as quickly as possible?
07-07
Blockchain
Cannot make rustc use simd instructions for inclusive range loops
07-02
Mobile
cuda SIMD instruction for per-byte multiplication with unsigned saturation
06-25
Blockchain
SSE interleave/merge/combine 2 vectors using a mask, per-element conditional move?
06-11
OS
AVX2 - storing integers at arbitrary indices in an array
06-11
front end
Why does MOVD/MOVQ between GP and SIMD registers have quite high latency?
06-08
Enterprise
Is there a better way to any detect bits that are set in a 16-byte array of flags?
06-05
OS
Is vfmadd132pd slow on AMD Zen 3 architecture?
05-30
Enterprise
Best way to mask a single bit in AVX2?
05-24
Enterprise
Multiplication of complex numbers using AVX2 FMA3
05-24
Mobile
Multiplication of complex numbers using AVX2
05-20
database
Why does this code execute more slowly after strength-reducing multiplications to loop-carried addit
05-19
database
aarch64-gcc simd inline asm, result always 0
05-18
Blockchain
Sudden jump in sin function frequency
05-14
other
Program in assembly x86
05-04
Software design
How to create a left-packed vector of indices of the 0s in one SIMD vector?
05-04
Software engineering
SIMD - how to add corresponding values from 2 vectors of different element widths (char or uint8_t a
05-02
Back-end
Is there a best way to deal with undefined behavior in bitwise conversion between floats and integer
05-01
database
Implementing matrix operation using AVX in C
04-19
OS
How do I interpret the instruction `mov v2.2d[0],x14` in aarch64 assembly?
04-17
Software engineering
Code duplication issue without define macro
04-12
Enterprise
count number of unique values in a 128bit avx vector, or detecting if all elements are equal?
04-12
Software engineering
count number of unique values in a 128bit avx vector
04-07
OS
C Loop vectorization - counting matches of 7-byte records with masking
03-23
Net
Is there a fast way to convert a string of 8 ASCII decimal digits into a binary number?
03-22
front end
Is there a fast way to pack multiple digits into a number?
03-22
OS
fast bit-matrix (64x64) transpose algorithm using SIMD (ARM)
03-21
Mobile
fast bitwise 64x64 bit-matrix transpose algorithm using SIMD (ARM)
03-17
Software engineering
__attribute__ ((vector_size)) magic being destroyed by union
03-11
database
How to interpret uops.info?
02-23
Back-end
AVX performance slower for bitwise xor op and popcount
02-22
Enterprise
How to use SSE instruction set on C6678 DSP?
12-31
Software engineering
Discrepancy in result of Intrinsics vs Naive Vector reduction
12-25
Back-end
Faster memcpy assuming readability and writability of bytes past source and destination buffers
12-24
Software design
AVX2: CountTrailingZeros on 8 bit elements in AVX register
54
1
2
Next
Last
Links:
CodePudding