Why is my theoretically more efficient prime tester slower?-CodePudding

I am experimenting with different prime testers in Haskell. This algorithm checks whether an integer n is prime by trying to divide it by every integer i from 2 to sqrt(n):

isPrime2a n = 
    helper 2
  where
    helper i
      | (i * i) > n = True
      | mod n i == 0 = False
      | otherwise = helper (i 1)

I believe this implementation to test only prime divisors:

primes3a = 2 : 3 : filter isPrime3a [4..]
isPrime3a n = 
    not (divisibleByAny (tail primes3a))
  where
    divisibleByAny (i : rest)
      | (i * i) > n = False
      | mod n i == 0 = True
      | otherwise = divisibleByAny rest

This transcript shows that (contrary to my expectations) the first function is faster:

ghci> isPrime2a 10000000019
True
(0.14 secs, 50,455,088 bytes)

ghci> isPrime3a 10000000019
True
(1.27 secs, 475,058,704 bytes)

How come? Is my analysis of the algorithms incorrect, or does the overhead of building up a list outweigh the reduced number of mod operations?

CodePudding user response：

It's not just the memory.

Step through how this works for a reasonably small number, like 29.

isPrime2a has to divide 29 by 2, then by 3, then by 4, then by 5, then it stops at 6 (because 6*6 = 36, which is > 29) and reports True.

isPrime3a has to divide 29 by 2, then by 3, then by the next element of primes3a. To work out what that is filter isPrime3a [4..] has to divide 4 by 2 to confirm the next element isn't 4. Then filter isPrime3a has to divide 5 by 2, then stop at 3 to confirm 5 is the next element of primes3a.

So then isPrime3a can divide 29 by 5, and then we're back to needing the next element of primes3a, which means running filter isPrime3a some more. We were up to 6, so we divide 6 by 2 to find out it isn't the next element of primes3a. Then filter isPrime3a considers 7, which means dividing it by 2, then by 3, then stopping at 5 (4 wasn't in primes3a, remember) to confirm 7 is in primes3a.

Finally isPrime3a can stop, because 7*7 = 49, which is > 29.

That was way more steps! Note that isPrime3a only had to divide 29 by 2, 3, and 5, while isPrime2a had to divide 29 by 2, 3, 4, 5, and 6; isPrime3a got to skip over 4 and 6, saving 2 divisions. But to do that it had to divide 4 by 2, 5 by 2 and 3, 6 by 2, and 7 by 2 and 3; to save those 2 divisions we did 6 other ones! There's also some administrative overhead going on like not and filter that isn't present at all in primes2a, and when all you're doing is arithmetic operations stuff like that is probably significant (at least until we get to huge numbers).

However, we see different behaviour when we run each of them more than once:

λ isPrime2a 10000000019
True
it :: Bool
(0.07 secs, 50,474,736 bytes)
*Main
λ isPrime2a 10000000019
True
it :: Bool
(0.07 secs, 50,474,736 bytes)
*Main
λ isPrime2a 10000000019
True
it :: Bool
(0.06 secs, 50,474,736 bytes)
*Main
λ isPrime2a 10000000019
True
it :: Bool
(0.07 secs, 50,474,736 bytes)

Notice that it takes the same time (and memory) for repeat executions.

λ isPrime3a 10000000019
True
it :: Bool
(0.68 secs, 475,082,944 bytes)
*Main
λ isPrime3a 10000000019
True
it :: Bool
(0.01 secs, 4,198,032 bytes)
*Main
λ isPrime3a 10000000019
True
it :: Bool
(0.01 secs, 4,198,032 bytes)
*Main
λ isPrime3a 10000000019
True
it :: Bool
(0.02 secs, 4,198,032 bytes)

isPrime3a takes much more time and memory than isPrime2a on the first execution. Subsequent ones are much faster and allocate less; the primes3a list has already been built, so although it's occupying a good chunk of memory it doesn't have to be built for any following isPrime3a query (that doesn't need any more primes).

Also, you have a bug in your implementation.

λ take 20 primes3a
[2,3,4,5,6,7,8,10,11,13,14,17,19,22,23,26,29,31,34,37]
it :: [Integer]

Because you check not (divisibleByAny (tail primes3a)), you never check for divisibility by the head of primes3a, namely 2. Thus all even numbers are reported as primes. Fixing that:

λ isPrime3a' 10000000019
True
it :: Bool
(0.33 secs, 230,822,560 bytes)
*Main
λ isPrime3a' 10000000019
True
it :: Bool
(0.01 secs, 2,760,760 bytes)
*Main
λ isPrime3a' 10000000019
True
it :: Bool
(0.01 secs, 2,760,760 bytes)
*Main
λ isPrime3a' 10000000019
True
it :: Bool
(0.01 secs, 2,760,760 bytes)

Doesn't change the conclusion really. It's still much slower than isPrimes2 on the first execution, much faster on subsequent runs.

Long story short: both of these behaviours are useful. If you needed to test primality in a real program and had to choose between these two implementations, you might go for primes2a if you only needed to test one (or a handful) of numbers. primes3a looks good if you're going to test lots of numbers, and you can afford to keep a large list in memory the whole time (or you're testing numbers small enough that the list isn't large). Neither one isn't strictly better than the other, on these numbers.

Of course, as Louis Wasserman pointed out in the comments, serious performance measurement (i.e. meaningful to help you predict the performance of programs as they would be run for practical purposes) should not be made in GHCi. How to do it properly is a whole other topic, but the biggest thing you're missing is that the code is not optimised, and GHC's optimizer can make a huge difference to the speed and execution of code, but it is also very non-uniform in how much it speeds things up. It's very possible that you can have two versions of an algorithm where one is more than 10 times faster than the other when you compare an unoptimised build while the reverse is true when you compare optimised build. If you care about the performance then you'll almost always going to run it with optimisation, so that's usually the state you want to measure.