I'm just starting to learn about CPU cache in depth and I want to learn how to estimate a functions instruction size in CPU cache for curiosity reasons.
So far I learned it's not very easy to monitor L1 cache by surfing in SO and Google. But surprisingly I couldn't find any posts explaining my question.
If it's not possible, at least knowing when someone should worry about filling L1/L2 caches and not would be good to know.
Thanks.
CodePudding user response:
Can you measure it?
Yes. Take a look at the output of a disassembler or measure the size increase of the library.
Should you worry about it?
Absolutely not. The executable code is usually tiny. If you're going through it once, even if we're talking GBs it's going to be fast. The usual way to make things slow is loops and recursion and usually such functions tend to be focused and small. On most system the several MBs of L1 cache should cover anything interesting code wise.
The usual source of cache dependent speedups is memory access patterns. If your tiny loop skips all over the memory, it will be a lot slower than a gigantic function that accesses things more or less linearly or is very predictable.
Another source of bad performance in the code tends to be branch predictions. Incorrectly predicting the outcome of a branch causes a stall in the CPU. Get enough of those the the performance will suffer.
Both of those are usually the last drops of performance to be squeezed out of a system. Make sure the code works correctly first, then go and try to find performance improvements, usually starting with algorithm and data structure optimizations around the hottest bits of code (the most executted).