Home > OS >  Different Benchmark results for functions with same early return statement
Different Benchmark results for functions with same early return statement

Time:07-08

I've implemented two functions for rotating an input array n times to the right.

Both implementations exit early, without performing anything, if the initial array would be equal to the resulting array after all rotations. This happens anytime n is 0 or a multiple of the length of the array.

func rotateRight1(nums []int, n int) {
    n = n % len(nums)
    if n == 0 {
        return
    }

    lastNDigits := make([]int, n)
    copy(lastNDigits, nums[len(nums)-n:])

    copy(nums[n:], nums[:len(nums)-n])
    copy(nums[:n], lastNDigits)
}

and

func rotateRight2(nums []int, n int) {
    n = n % len(nums)
    if n == 0 {
        return
    }

    i := 0
    current := nums[i]
    iAlreadySeen := i
    for j := 0; j < len(nums); j   {
        nextI := (i   n) % len(nums)

        nums[nextI], current = current, nums[nextI]

        i = nextI

        // handle even length arrays where i k might equal an already seen index
        if nextI == iAlreadySeen {
            i = (i   1) % len(nums)
            iAlreadySeen = i
            current = nums[i]
        }
    }
}

When benchmarking, I was surprised to see a 20x difference in speed when n equals 0 for both functions.

func BenchmarkRotateRight1(b *testing.B) {
    nums := make([]int, 5_000)

    b.ResetTimer()
    b.ReportAllocs()
    for i := 0; i < b.N; i   {
        rotateRight1(nums, 0)
    }
}

func BenchmarkRotateRight2(b *testing.B) {
    nums := make([]int, 5_000)

    b.ResetTimer()
    b.ReportAllocs()
    for i := 0; i < b.N; i   {
        rotateRight2(nums, 0)
    }
}

go test -bench=. yields a result like this consistently:

cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
BenchmarkRotateRight1-4         1000000000               0.4603 ns/op          0 B/op          0 allocs/op
BenchmarkRotateRight2-4         97236492                12.11 ns/op            0 B/op          0 allocs/op
PASS

I don't understand this performance difference as both functions are basically doing the same thing and exiting early in the if k == 0 condition.

Could someone help me understand this?

go version go1.18 linux/amd64

CodePudding user response:

What you see is the result of compiler optimization. The compiler is clever enough to make the applications faster by inlining function calls. Sometimes, the compiler optimizes to remove function calls and artificially lowers the run time of benchmarks (that can be tricky).

After profiling, we can notice the function rotateRight1 is not even being called during the benchmark execution:

(pprof) list BenchmarkRotateRight1
Total: 260ms
ROUTINE ======================== main_test.BenchmarkRotateRight1 in (edited)
     260ms      260ms (flat, cum)   100% of Total
         .          .     43:func BenchmarkRotateRight1(b *testing.B) {
         .          .     44:   nums := make([]int, 5_000)
         .          .     45:
         .          .     46:   b.ResetTimer()
         .          .     47:   b.ReportAllocs()
     260ms      260ms     48:   for i := 0; i < b.N; i   {
         .          .     49:       rotateRight1(nums, 0)
         .          .     50:   }
         .          .     51:}

On the other hand, rotateRight2 is being called, and that's why you see that difference in the run time of the benchmarks:

(pprof) list BenchmarkRotateRight2
Total: 1.89s
ROUTINE ======================== main_test.BenchmarkRotateRight2 in (edited)
     180ms      1.89s (flat, cum)   100% of Total
         .          .     53:func BenchmarkRotateRight2(b *testing.B) {
         .          .     54:   nums := make([]int, 5_000)
         .          .     55:
         .          .     56:   b.ResetTimer()
         .          .     57:   b.ReportAllocs()
     130ms      130ms     58:   for i := 0; i < b.N; i   {
      50ms      1.76s     59:       rotateRight2(nums, 0)
         .          .     60:   }
         .          .     61:}

If you run your benchmark with -gcflags '-m -m' (double -m), you will see some of the compiler decisions to optimize the code:

$ go test -gcflags '-m -m' -benchmem -bench . main_test.go

# command-line-arguments_test [command-line-arguments.test]
./main_test.go:7:6: can inline rotateRight1 with cost 40 as: func([]int, int) { n = n % len(nums); if n == 0 { return  }; lastNDigits := make([]int, n); copy(lastNDigits, nums[len(nums) - n:]); copy(nums[n:], nums[:len(nums) - n]); copy(nums[:n], lastNDigits) }
./main_test.go:20:6: cannot inline rotateRight2: function too complex: cost 83 exceeds budget 80
...

So, based on a complexity threshold, the compiler decides whether to optimize or not.

You can use the//go:noinline directive right before the function you want to disable inlining optimization though. That will override the compiler's usual optimization rules:

//go:noinline
func rotateRight1(nums []int, n int) {
...
}

Now you will notice the benchmarks are very similar:

cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkRotateRight1
BenchmarkRotateRight1-16        135554571            8.886 ns/op           0 B/op          0 allocs/op
BenchmarkRotateRight2
BenchmarkRotateRight2-16        143716638            8.775 ns/op           0 B/op          0 allocs/op
PASS
  • Related