The following program in C# computes 10 million Babylonian iterations for the square root.
using System;
using System.Diagnostics;
namespace Performance {
public class Program {
public static void MeasureTime(long n, Action f) {
Stopwatch watch = new Stopwatch();
watch.Start();
for (long i = 0; i < n; i) f();
watch.Stop();
Console.WriteLine($"{(n / watch.ElapsedMilliseconds) / 1000} Mop/s, {watch.ElapsedMilliseconds} ms");
}
public static void TestSpeed(double a) {
Console.WriteLine($"Parameter {a}");
double x = a;
long n = 10_000_000;
MeasureTime(n, () => x = (a / x x) / 2);
Console.WriteLine($"{x}\n");
}
static void Main(string[] args) {
TestSpeed(2);
TestSpeed(Double.PositiveInfinity);
}
}
}
When I run this on my computer in Release mode, I get:
Parameter 2
99 Mop/s, 101 ms
1,41421356237309
Parameter ∞
3 Mop/s, 3214 ms
NaN
Here Mop/s
stands for million operations per second. When the parameter is infinity, for some reason the code slows down more than 30x.
Why is this?
For comparison, here is the same program written in C 20:
#include <iostream>
#include <chrono>
#include <format>
namespace Performance {
template <typename F>
void MeasureTime(long long n, F f) {
auto begin = std::chrono::steady_clock::now();
for (long long i = 0; i < n; i) f();
auto end = std::chrono::steady_clock::now();
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count();
std::cout << std::format("{0} Mop/s, {1} ms", (n / ms) / 1000, ms) << std::endl;
}
void TestSpeed(double a) {
std::cout << std::format("Parameter {0}", a) << std::endl;
double x = a;
long long n = 10'000'000;
MeasureTime(n, [&]() { x = (a / x x) / 2; });
std::cout << std::format("{0}\n\n", x);
}
}
using namespace Performance;
int main() {
auto inf = std::numeric_limits<double>::infinity();
TestSpeed(2);
TestSpeed(inf);
return 0;
}
When I run this program in Release mode, I get:
Parameter 2
181 Mop/s, 55 ms
1.414213562373095
Parameter inf
192 Mop/s, 52 ms
-nan(ind)
Which is as expected; i.e. no difference in performance.
Both programs are built in Visual Studio 2022 version 17.1.0. C# project is a Net Framework 4.7.2 Console Application.
CodePudding user response:
The problem was fixed by unchecking Prefer 32-bits
in the C# project options.
I was also able to reproduce the performance problem on C side by changing the Enable Enhanced Instruction Set
option in Visual Studio to either No Enhanced Instructions (/arch:IA32)
or Streaming SIMD Extensions (/arch:SSE)
. These options are only available when building a 32-bit program. As was hinted by @shingo in the comments, there seems to be a performance problem when computing with NaNs in older 32-bit instruction sets. Indeed, the given code computes solely with NaNs when the parameter a
is set to infinity.