Fast and precise implementation of pow(complex<double>, 2)-CodePudding

I have a code in which I perform many operations of the form

double z_sq = pow(abs(std::complex<double> z), 2);

to calculate |z|² of a complex number z, where performance and precision are my major concerns. I read that the pow() function of C converts int exponents to double and thus introduces numerical errors apart from also having worse performance than x*x in general (discussed here for real numbers). Similarly, the std::pow() method of <complex> converts the exponent to std::complex<double> unnecessarily.

My first question is how to implement the complex square Re² Im² or zz* with best-possible performance and precision. I thought about doing something like

double complex_square1(std::complex<double> z) {
    double abs_z = abs(z);
    return abs_z * abs_z;
}

or (which gives a type error, but would be the most straight-forward way, I can think of)

double complex_square2(std::complex<double> z) {
    return z * conj(z);
}

My second question is, whether for many (10^14) such operations, the function calls can decrease performance noticably. So would it be better in the case of complex_square1 to write abs(z) * abs(z) into the .cpp file instead of defining a function for it? Are there any other better ways to do it?

CodePudding user response：

I think it would be hard to beat just taking the sum of the squares of the imaginary and real parts.

Below I measure it being about 5x faster than actually calculating the magnitude and squaring it:

#include <complex>
#include <chrono>
#include <random>
#include <vector>
#include <iostream>

double square_of_magnitude_1( std::complex<double> z) {
    auto magnitude = std::abs(z);
    return magnitude * magnitude;
}

double square_of_magnitude_2( std::complex<double> z) {
    return z.imag() * z.imag()   z.real() * z.real();
}


volatile double v; // assign to volatile so calls do not get optimized away.
std::random_device rd;
std::mt19937 e2(rd());
std::uniform_real_distribution<double> dist(0, 10);

int main() {
    using std::chrono::high_resolution_clock;
    using std::chrono::duration_cast;
    using std::chrono::duration;
    using std::chrono::microseconds;

    std::vector<std::complex<double>> numbers(10000000);
    std::generate(numbers.begin(), numbers.end(), []() {return std::complex<double>(dist(e2), dist(e2)); });

    auto t1 = high_resolution_clock::now();
    for (const auto& z : numbers) {
        v = square_of_magnitude_1(z);
    }
    auto t2 = high_resolution_clock::now();
    std::cout << "square_of_magnitude_1 => " << duration_cast<microseconds>(t2 - t1).count() << "\n";


    t1 = high_resolution_clock::now();
    for (const auto& z : numbers) {
        v = square_of_magnitude_2(z);
    }
    t2 = high_resolution_clock::now();
    std::cout << "square_of_magnitude_2 => " << duration_cast<microseconds>(t2 - t1).count() << "\n";

}

A typical run of the above yields

square_of_magnitude_1 => 54948
square_of_magnitude_2 => 9730

CodePudding user response：

Based on Peter's answer (-thanks again), which is the fastest of the suggested methods, I supplement my comparison code for anyone interested:

#include <complex>
#include <chrono>

using complex = std::complex<double>;

double complex_square1(complex z) {
    return z.real()*z.real()   z.imag()*z.imag();
}


double complex_square2(complex z) {
    double abs_z = abs(z);
    return abs_z * abs_z;
}


complex z = complex(0.5, 0.5);

unsigned int N = (int)1e7;
double z_sq;

int main() {
    auto start = std::chrono::high_resolution_clock::now();

    for (size_t i = 0; i < N;   i) {
        z_sq = z.real()*z.real()   z.imag()*z.imag();       // fastest
        // z_sq = complex_square1(z);                       // slower by a factor 1.6
        // z_sq = complex_square2(z);                       // slower by a factor 5.5
        // z_sq = pow(abs(z), 2);                           // slower by a factor 6
        // z_sq = norm(z);                                  // slower by a factor 5
    }

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();

    printf("Duration: %d ms\n", duration);
    printf("z_sq = %f", z_sq);
}

This also answers my second question: The function calls become indeed noticable, but only by a fractional increase in computation time of about 60%, measured on my Windows machine and using the GCC compiler