Efficiently counting ALL pythagorean triples with hypotenuse <= N-CodePudding

I want to count all pythagorean triples (primitive and non-primitive) whose hypotenuse is <= N for a given N. This OEIS link gives this count for powers of 10. A simple but somewhat efficient pseudocode can be easily derived from this link or from wikipedia, using the famous Euclid's formula. Interpreted in Java, this can be made into:

public class countPTS {
    
    public static void main(String[] args) 
    {
        long N = 1000000L;
        
        long count = 0;
        for (long m = 2 ; m * m   1 <= N ; m  )
            for (long n = 1   m % 2 ; n < m && n * n   m * m <= N ; n  = 2)
                if (gcd(m , n) == 1)
                    count  = N / (n * n   m * m);
        
        System.out.println(count);
    }
    
    public static long gcd(long a, long b)
    { 
        if (a == 0) 
            return b; 
        return gcd(b % a, a); 
    } 

}

Which gives the correct results, but is a bit "slow". Regardless of the actual time complexity of this algorithm in big O notation, it seems to grow like a little bit worse than O(n). It can be easily checked that almost the entire time is spent in gcd calculations.

If C(N) is the count of such triples, then this piece of code written and ran on the latest version of Eclipse IDE, using a modern PC and single threaded, yields C(10^10) = 34465432859 in about 90 seconds. This makes me wonder how the high values in the OEIS link were obtained, the largest being C(10^17).

I suspect that an algorithm with a better time complexity was used, perhaps O(n^2/3) or O(n^3/4), maybe with some more advanced mathematical insight. It is somewhat plausible that the above algorithm was ran but with significant improvements and perhaps multithreaded. Can anyone shed light on this topic and higher values?

CodePudding user response：

You can avoid the GCD calculation altogether:

Make a list of all primes < sqrt(N)
Iterate through all subsets of those primes with total product < sqrt(N)
For each subset, the primes in the set will be "m primes", while the other primes will be "n primes"
Let "m_base" be the product of all m primes
For m, iterate through all combinations of at m primes with product < sqrt(N)/m_base, yielding m = m_base * that combination and:
Calculate the maximum value of n for each m
For n, iterate through all combinations of n primes with product < max

CodePudding user response：

TD;LR: There is an approach that is slightly worse than O(N^(2/3)). It consists of enumerating primitive triples with an efficient algorithm up to N^(2/3) and counting the resulting Pythagorean triples, and then using inclusion-exclusion to count Pythagorean primitive triples past that and finding a count of the rest. The rest of this is a detailed explanation of how this all works.

Your direct calculation over primitive triples cannot be made better than O(N). The reason why is that over 60% of pairs of integers are relatively prime. So we can put a lower bound on the number of pairs of relative primes as follows.

There are floor(sqrt(n/2)) choose 2 = O(N) pairs of integers at most sqrt(n/2). The ones that are relatively prime with one even give rise to a primitive Pythagorean triple.
60% of them are relatively prime.
This 60% is contained in the 75% that do not have both numbers even. Therefore when we get rid of pairs with both numbers odd, at least 60% - 25% = 35% of pairs are relatively prime with one of them even.
Therefore there are at least O(n) distinct Pythagorean triples to find whose hypotenuse is less than n.

Given that there are O(N) primitive triples, you'll have to do at least O(N) work to enumerate them. So you don't want to enumerate them all. Instead we'll enumerate up to some point, and then we'll do something else with the rest. But let's make the enumeration more efficient.

Now as you noticed, your GCD checks are taking most of your time. Can we enumerate all of them with hypotenuse less than, say, L more efficiently?

Here is the most efficient approach that I found.

Enumerate the primes up to sqrt(L) with any decent technique. This is O(sqrt(L) log(L) with the Sieve of Eratosthenes, that is good enough.
Using this list, we can produce the set of odd numbers up to sqrt(L), complete with their prime factorizations, fairly efficiently. (Just recursively produce prime factorizations that fit in the range.)
Given an odd number and its prime factorizations, we can run through the even numbers and quickly find which are relatively prime. That gives us the primitive Pythagorean triples.

For step 3, create a PriorityQueue whose elements start off pairs (2*p, 2*p) for each distinct prime factor p. For example for 15 we'd have a queue with (6, 6) and (10, 10). We then do this:

i = the odd number we started with
j = 2
while i*i   j*j < L:
    if queue.peek()[0] == i:
        while queue.peek()[0] == i:
            (x, y) = queue.pop()
            queue.add((x y, y))
    else:
        i*i   j*j is the hypotenuse of a primitive Pythagorean triple

This is very slightly superlinear. In fact the expected complexity of finding all the numbers out to n that are relatively prime to i is O(n * log(log(log(i)))). Why? Because the average number of distinct prime factors is on average O(log(log(i))). The queue operations are O(log(THAT)) and we do on average O(n) queue operations. (I'm waving my hands past some number theory, but the result is correct.)

OK, what is our alternative to actually enumerating many of the primitive triples?

The answer is to use inclusion-exclusion to count the number below some bound, L Which I will demonstrate by counting the number of primitive triples whose hypotenuse is at most 100.

The first step is to implement a function EvenOddPairCount(L) that counts how many pairs of even/odd numbers there are whose squares sum up to at most L . To do this we traverse the fringe of maximal even/odds. This takes O(sqrt(L)) to traverse. Here is that calculation for 100:

1*1   8*8, 4 pairs
3*3   8*8, 4 pairs
5*5   8*5, 4 pairs
7*7   6*6, 3 pairs
9*9   2*2, 1 pair

Which gives 16 pairs.

Now the problem is that we have counted things like 3*3 6*6 = 45 that are not relatively prime. But we can subtract off EvenOddPairCount(L/9) to find those that are both divisible by 3. We can likewise subtract off EvenOddPairCount(L/25) to find the ones that are divisible by 5. But now the ones divisible by 15 have been added once, and subtracted twice. So we have to add back in EvenOddPairCount(L/(15*15)) to again not count those.

Keep going and you get a sum is over all distinct products x of odd primes of EvenOddPairCount(L/(x*x)) where we add if we had an even number of prime factors (including 0), and subtract if we had an odd number. Since we already have the odd primes, we can easily produce the sequence of inclusion-exclusion terms. It starts off 1, -9, -25, -49, -121, -169, 225, -289, -361, 441 and so on.

But how much work does this take? If l = sqrt(L) it takes l l/3 l/5 l/7 l/11 l/13 l/15 ... 1/l < l*(1 1/2 1/3 1/4 ... 1/l) = O(l log(l)) = O(sqrt(L) log(L)).

And so we can calculate PrimitiveTriples(L) in time O(sqrt(L) log(L)).

What does this buy us? Well, PrimitiveTriples(N) - PrimitiveTriples(N/2) gives us the number of primitive Pythagorean triples whose hypotenuse is in the range from N/2 to N. And 2*(PrimitiveTriples(N/2) - PrimitiveTriples(N/3)) gives us the number of Pythagorean triples which are multiples of primitive ones in the range from N/3 to N/2. And 3*(PrimitiveTriples(N/3) - PrimitiveTriples(N/4) gives us the number of Pythagorean triples in the range from N/4 to N/3.

Therefore we can enumerate small primitive Pythagorean triples, and figure out how many Pythagorean triples are a multiple of those. And we can use inclusion-exclusion to count the multiples of large Pythagorean triples. Where is the cutoff between the two strategies?

Let's ignore the various log factors for the moment. If we cut off at N^c then we do roughly O(N^c) work on the small. And we need to calculate N^(1-c) thresholds of PrimitiveTriples, the last of which takes work roughly O(N^(c/2)). So let's try setting N^c = N^(1-c) * N^(c/2). That works out when c = (1-c) c/2 which happens when c = 2/3.

And this is the final algorithm.

Find all primes up to sqrt(N).
Use those primes to enumerate all primitive triples up to O(n^(2/3)) and figure out how many Pythagorean triples they give.
Find all of the inclusion exclusion terms up to N (there are O(sqrt(N)) of them and can be produced directly from the factorization).
Calculate PythagoreanTriples(N/d) for all d up to N^(1/3). Use that to count the rest of the triples.

And all of this runs in time o(N^(2/3 epsilon)) for any epsilon greater than 0.