https://youtu.be/o7h_sYMk_oc?t=1963
In this video he is explaining that retrieving data that is far away creates a worse cache lines utilization and then he follows with a line that I don't understand. "so the processor is going to be bringing in 64 bytes to operate on a particular datum. And then it's ignoring 7 of the 8 floating-point words on that cache line and going to the next one" What does he mean by that.
CodePudding user response:
A cache is typically based on cache lines. When data is read into the cache, it's done by reading a complete cache line. So if the cache line contians 64 bytes, the processors HW makes sure to read 64 consecutive bytes from memory into the cache. If a floating point double is 8 bytes, a single cache line can hold 8 doubles.
Now if your code uses consecutive doubles the cache access will be:
Access double located in Addr --> Miss, 64 bytes read into the cache (slow)
Access double located in Addr 1 --> Hit (fast)
Access double located in Addr 2 --> Hit (fast)
Access double located in Addr 3 --> Hit (fast)
Access double located in Addr 4 --> Hit (fast)
Access double located in Addr 5 --> Hit (fast)
Access double located in Addr 6 --> Hit (fast)
Access double located in Addr 7 --> Hit (fast)
Access double located in Addr 8 --> Miss, 64 bytes read into the cache (slow)
Access double located in Addr 9 --> Hit (fast)
Access double located in Addr 10 --> Hit (fast)
Access double located in Addr 11 --> Hit (fast)
Access double located in Addr 12 --> Hit (fast)
Access double located in Addr 13 --> Hit (fast)
Access double located in Addr 14 --> Hit (fast)
Access double located in Addr 15 --> Hit (fast)
Access double located in Addr 16 --> Miss, 64 bytes read into the cache (slow)
Access double located in Addr 17 --> Hit (fast)
...
So here you have 1 slow read followed by 7 fast reads because your program uses consecutive doubles.
However, if your program always uses doubles that are placed 8 doubles (aka 64 bytes) from each other, your pattern will be:
Access double located in Addr --> Miss, 64 bytes read into the cache (slow)
Access double located in Addr 8 --> Miss, 64 bytes read into the cache (slow)
Access double located in Addr 16 --> Miss, 64 bytes read into the cache (slow)
...
Here you will only get slow reads and you won't get any benefit from the cache system.