I want to measure the time a function I programmed takes when executed. While there are many threads on SO on how to measure time in c , I did not find any that explains how to prevent the compiler optimization to clear away my function-call. So right now I do something like:
bool testFunctionSpeed() {
auto input = loadInput();
auto start = std::chrono::high_resolution_clock::now();
for (int i; i < num_iterations; i ) {
auto result = function(input);
}
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
}
It appears to work right now, but I'm not sure if this is stable, because: Doing this, the compiler can see that the result stored in "result" is not used outside of the loop's scope, so it might just optimize the code away (...right?).
I am aware that there are flags to not optimize the code, however I want to benchmark my code under "real" conditions! One way would be to randomly printf out parts of the result, however that does not seem to be a "correct" solution.
What is the right approach to this?
CodePudding user response:
To prevent compiler from optimizing away function calls just make input and output of that function a volatile
variable.
Result is guaranteed to be computed and stored in volatile output variable on each loop run.
While volatile input will prevent optimizer from precomputing value of your function in advance, if you don't mark input as volatile then compiler may just write a constant to output result variable on each loop iteration.
Click Try it online!
linke below to see program in action and also assembly listing.
Your code example with improvements is below:
#include <cmath>
#include <iostream>
#include <chrono>
int function(int x) {
return int(std::log2(x));
}
bool testFunctionSpeed() {
size_t const num_iterations = 1 << 20;
auto volatile input = 123;
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < num_iterations; i) {
auto volatile result = function(input);
}
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(
end - start).count() / num_iterations;
std::cout << duration << " ns per iteration" << std::endl;
return true;
}
int main() {
testFunctionSpeed();
}
Output:
8 ns per iteration