I have a code in which I perform many operations of the form
double z_sq = pow(abs(std::complex<double> z), 2);
to calculate |z|² of a complex number z, where performance and precision are my major concerns. I read that the pow()
function of C converts int
exponents to double
and thus introduces numerical errors apart from also having worse performance than x*x in general (discussed here for real numbers). Similarly, the std::pow()
method of <complex>
converts the exponent to std::complex<double>
unnecessarily.
My first question is how to implement the complex square Re² Im² or zz* with best-possible performance and precision. I thought about doing something like
double complex_square1(std::complex<double> z) {
double abs_z = abs(z);
return abs_z * abs_z;
}
or (which gives a type error, but would be the most straight-forward way, I can think of)
double complex_square2(std::complex<double> z) {
return z * conj(z);
}
My second question is, whether for many (10^14) such operations, the function calls can decrease performance noticably. So would it be better in the case of complex_square1
to write abs(z) * abs(z)
into the .cpp file instead of defining a function for it? Are there any other better ways to do it?
CodePudding user response:
I think it would be hard to beat just taking the sum of the squares of the imaginary and real parts.
Below I measure it being about 5x faster than actually calculating the magnitude and squaring it:
#include <complex>
#include <chrono>
#include <random>
#include <vector>
#include <iostream>
double square_of_magnitude_1( std::complex<double> z) {
auto magnitude = std::abs(z);
return magnitude * magnitude;
}
double square_of_magnitude_2( std::complex<double> z) {
return z.imag() * z.imag() z.real() * z.real();
}
volatile double v; // assign to volatile so calls do not get optimized away.
std::random_device rd;
std::mt19937 e2(rd());
std::uniform_real_distribution<double> dist(0, 10);
int main() {
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::duration;
using std::chrono::microseconds;
std::vector<std::complex<double>> numbers(10000000);
std::generate(numbers.begin(), numbers.end(), []() {return std::complex<double>(dist(e2), dist(e2)); });
auto t1 = high_resolution_clock::now();
for (const auto& z : numbers) {
v = square_of_magnitude_1(z);
}
auto t2 = high_resolution_clock::now();
std::cout << "square_of_magnitude_1 => " << duration_cast<microseconds>(t2 - t1).count() << "\n";
t1 = high_resolution_clock::now();
for (const auto& z : numbers) {
v = square_of_magnitude_2(z);
}
t2 = high_resolution_clock::now();
std::cout << "square_of_magnitude_2 => " << duration_cast<microseconds>(t2 - t1).count() << "\n";
}
A typical run of the above yields
square_of_magnitude_1 => 54948
square_of_magnitude_2 => 9730
CodePudding user response:
Based on Peter's answer (-thanks again), which is the fastest of the suggested methods, I supplement my comparison code for anyone interested:
#include <complex>
#include <chrono>
using complex = std::complex<double>;
double complex_square1(complex z) {
return z.real()*z.real() z.imag()*z.imag();
}
double complex_square2(complex z) {
double abs_z = abs(z);
return abs_z * abs_z;
}
complex z = complex(0.5, 0.5);
unsigned int N = (int)1e7;
double z_sq;
int main() {
auto start = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < N; i) {
z_sq = z.real()*z.real() z.imag()*z.imag(); // fastest
// z_sq = complex_square1(z); // slower by a factor 1.6
// z_sq = complex_square2(z); // slower by a factor 5.5
// z_sq = pow(abs(z), 2); // slower by a factor 6
// z_sq = norm(z); // slower by a factor 5
}
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
printf("Duration: %d ms\n", duration);
printf("z_sq = %f", z_sq);
}
This also answers my second question: The function calls become indeed noticable, but only by a fractional increase in computation time of about 60%, measured on my Windows machine and using the GCC compiler