{
Asm (" PLD [% 0 # 32] ": :" r "(SRC));
* dst++=* src++;
}
For the cycle (n=1280 * 1280), the compile time added - fprefetch - loop - arrays option, did not have the effect of accelerated online said, excuse me, how to correctly use prefetching, prefetching really get memory handling in advance to the cache, confused Ing...
CodePudding user response:
Try:* dst++=* src++;
Asm (" PLDW [% 0 # 32] ": :" r "(DST));
Asm (" PLD [% 0 # 32] ": :" r "(SRC));
# 32 or # 64 depending on the cache line size, usually 64 bytes or 32 bytes, need to check the specific processor manual
CodePudding user response: