Is modding 1024 really faster than modding 1023?-CodePudding

My colleage told me that code modding 2's power will be optimized to bit operation and faster than modding other numbers. And I've checked the assembly which proves his option. But I wrote a benchmark code in Golang and run it with Go version 1.17. It seems that there is no much difference. Why does that happened and is he right?

Here is the Golang code:

package main

import (
    "fmt"
    "time"
)

const loop = 10000000000

func Mod1024() int {
    sum := 0
    for i := 0; i < loop; i   {
        sum  = i % 1024
    }
    return sum
}

func Mod1023() int {
    sum := 0
    for i := 0; i < loop; i   {
        sum  = i % 1023
    }
    return sum
}

func main() {
    start := time.Now()
    Mod1023()
    fmt.Println(time.Since(start).Microseconds())

    start = time.Now()
    Mod1024()
    fmt.Println(time.Since(start).Microseconds())
}

The result on my computer is:

2810668
2694136

CodePudding user response：

Executing a single modulo operation is really fast, it's in the magnitude of a single nanosecond. What your Mod1024() and Mod1023() functions do are a lot more: they increment and test a loop variable, perform addition and store the result in a local variable. These altogether are orders of magnitude more than a simple modding operation, be it optimized to bit masking or not.

Moreover, benchmarking any code as part of the main app is never a good idea, there are numerous factors that distort the result (often make it completely useless). See Order of the code and performance. Always use Go's testing framework (with benchmarking capabilities) to more reliably benchmark code.

So let's modify the example functions:

func Mod1023() {
    for i := 23; i23 != 0; i   {
    }
}

func Mod1024() {
    for i := 24; i24 != 0; i   {
    }
}

(The loops in the above functions will have 1000 iterations.)

And let's write proper benchmarking functions using Go's testing framework:

func BenchmarkMod1023(b *testing.B) {
    for i := 0; i < b.N; i   {
        Mod1023()
    }
}

func BenchmarkMod1024(b *testing.B) {
    for i := 0; i < b.N; i   {
        Mod1024()
    }
}

Running the benchmark using go test -bench ., the output is:

BenchmarkMod1023-8        881263              1346 ns/op
BenchmarkMod1024-8       3710430               325.4 ns/op

So yes, modding by 1024 gets optimized to bitmasking and is in fact faster: about 4 times faster. A whole nanosecond is saved for you due to optimization (1.3 ns vs 0.3 ns). Although this only matters if you have to execute this millions of times and no other code are part of the execution that are slower than a simple CPU mod operation.