Go benchmarking: dissonance between ns/op and runtime-CodePudding

I am benchmarking a software library I created in Go, and I encountered dissonance between runtime and ns/op. I am new to benchmarking, and Go's documentation and past stackoverflow questions do not conceptually cover benchmarking in depth, so I am seeking someone with more conceptual knowledge than me to help me (and other stackoverflow users in similar predicaments) understand what exactly is happening.

Benchmarking output for a task performed using native Go:

1000000000               0.6136 ns/op          0 B/op          0 allocs/op
PASS
ok      github.com/gabetucker2/gostack/benchmark        0.862s

Benchmarking output for the same task performed using my software library:

1576087               805.3 ns/op           544 B/op         21 allocs/op
PASS
ok      github.com/gabetucker2/gostack/benchmark        2.225s

Notice two things:

The ns/op of my software library is around 1200 times slower than the ns/op of native Go
The runtime of my software library is around 2 times slower than the runtime of native Go

It seems impossible to me that a very simple function from my software library should be 1200 times slower than native Go code, and it seems much more plausible that it is only 2 times slower... so what exactly is going on here?

Just in case it is useful, here are the Benchmark functions being called:

func test_Native_CreateArray() {

    myArr := []int {1, 2, 3}
    
    gogenerics.RemoveUnusedError(myArr)

}

func test_Gostack_CreateArray() {

    myStack := MakeStack([]int {1, 2, 3})
        
    gogenerics.RemoveUnusedError(myStack)

}

// native Go
func Benchmark_Native_CreateArray(b *testing.B) {
    for i := 0; i < b.N; i   {
        test_Native_CreateArray()
    }
}
// my software library "gostack"
func Benchmark_Gostack_CreateArray(b *testing.B) {
    for i := 0; i < b.N; i   {
        test_Gostack_CreateArray()
    }
}

Any clarity would be greatly appreciated.

CodePudding user response：

The first function ran 1_000_000_000 times with 0.61ns/op which is 0.61 seconds of the total runtime which took 0.862 seconds.

The second function ran 1_576_087 time with 805ns/op this takes around 1.26875 seconds of the 2.225 seconds. Forcing the second function to run 1_000_000_000 times should end up with around 805 seconds overhead.