Home > Net >  __builtin_prefetch making it faster in my code. What I need to do in the code
__builtin_prefetch making it faster in my code. What I need to do in the code

Time:11-20

in this program

#include <stdio.h>
#include <stdint.h>
int main()
{
    
    uint16_t *data=(uint16_t[]){1,2,3,4,5,6,7,8,9,10};
    int mlen=10;
    uint16_t partial=0;
       __builtin_prefetch(data   8);
    while (mlen >0) {
    
       partial  = *(uint16_t *)data;
       
        

       data  = 1;
       mlen -= 1;
   }   
    return 0;
}

I am using __builtin_prefetch(data 8); so until index 8 will be fetched in cache. But I I compile the program with

  gcc prefetcher.c -DDO_PREFETCH -o with-prefetch -std=c11 -O3

it is slower then this

  gcc prefetcher.c -o no-prefetch -std=c11 -O3

this is the output respectively

         12401      L1-dcache-load-misses     #    6.76% of all L1-dcache accesses
        183459      L1-dcache-loads                                             

   0.000881880 seconds time elapsed

   0.000952000 seconds user
   0.000000000 seconds sys

and this is without prefetcher

         12991      L1-dcache-load-misses     #    6.87% of all L1-dcache accesses
   189161      L1-dcache-loads                                             

   0.001349719 seconds time elapsed

   0.001423000 seconds user
   0.000000000 seconds sys

What I need to do it correctly so my __builtin_prefetch code run faster

above output is from perf progarm

CodePudding user response:

What I need to do it correctly so my __builtin_prefetch code run faster

You need to remove __builtin_prefetch. It's literally the only instruction that differs between code snippets. Compiler optimized your whole code to a no-op, as there are no side effects in your code.

Your first code snippet is compiled to:

main:
        xor     eax, eax
        ret

While your second code is compiled to:

main:
        xor     eax, eax
        prefetcht0      [rsp-24]
        ret

Even if you do return partial for example, the compiler is able to calculate the entire result at compile time and reduce the entire program to just return <constant>.

You can inspect the generated assembly of your programs with ease using https://godbolt.org/ .

  • Related