Home > Software design >  Cache hits on matrix multiplication
Cache hits on matrix multiplication

Time:10-10

https://youtu.be/o7h_sYMk_oc?t=1963

In this video he is explaining that retrieving data that is far away creates a worse cache lines utilization and then he follows with a line that I don't understand. "so the processor is going to be bringing in 64 bytes to operate on a particular datum. And then it's ignoring 7 of the 8 floating-point words on that cache line and going to the next one" What does he mean by that.

CodePudding user response:

A cache is typically based on cache lines. When data is read into the cache, it's done by reading a complete cache line. So if the cache line contians 64 bytes, the processors HW makes sure to read 64 consecutive bytes from memory into the cache. If a floating point double is 8 bytes, a single cache line can hold 8 doubles.

Now if your code uses consecutive doubles the cache access will be:

Access double located in Addr    --> Miss, 64 bytes read into the cache (slow)
Access double located in Addr 1  --> Hit (fast)
Access double located in Addr 2  --> Hit (fast)
Access double located in Addr 3  --> Hit (fast)
Access double located in Addr 4  --> Hit (fast)
Access double located in Addr 5  --> Hit (fast)
Access double located in Addr 6  --> Hit (fast)
Access double located in Addr 7  --> Hit (fast)
Access double located in Addr 8  --> Miss, 64 bytes read into the cache (slow)
Access double located in Addr 9  --> Hit (fast)
Access double located in Addr 10 --> Hit (fast)
Access double located in Addr 11 --> Hit (fast)
Access double located in Addr 12 --> Hit (fast)
Access double located in Addr 13 --> Hit (fast)
Access double located in Addr 14 --> Hit (fast)
Access double located in Addr 15 --> Hit (fast)
Access double located in Addr 16 --> Miss, 64 bytes read into the cache (slow)
Access double located in Addr 17 --> Hit (fast)
...

So here you have 1 slow read followed by 7 fast reads because your program uses consecutive doubles.

However, if your program always uses doubles that are placed 8 doubles (aka 64 bytes) from each other, your pattern will be:

Access double located in Addr    --> Miss, 64 bytes read into the cache (slow)
Access double located in Addr 8  --> Miss, 64 bytes read into the cache (slow)
Access double located in Addr 16 --> Miss, 64 bytes read into the cache (slow)
...

Here you will only get slow reads and you won't get any benefit from the cache system.

  • Related