Why is accessing a variable so much slower than accessing len()?-CodePudding

I wrote this function uniq that takes in a sorted slice of ints and returns the slice with duplicates removed:

func uniq(x []int) []int {
    i := 0
    for i < len(x)-1 {
        if x[i] == x[i 1] {
            copy(x[i:], x[i 1:])
            x = x[:len(x)-1]
        } else {
            i  
        }
    }
    return x
}

and uniq2, a rewrite of uniq with the same results:

func uniq2(x []int) []int {
    i := 0
    l := len(x)
    for i < l-1 {
        if x[i] == x[i 1] {
            copy(x[i:], x[i 1:])
            l--
        } else {
            i  
        }
    }
    return x[:l]
}

The only difference between the two functions is that in uniq2, instead of slicing x and directly accessing len(x) each time, I save len(x) to a variable l and decrement it whenever I shift the slice.

I thought that uniq2 would be slightly faster than uniq because len(x) would no longer be called iteration, but in reality, it is inexplicably much slower.

With this test that generates a random sorted slice and calls uniq/uniq2 on it 1000 times, which I run on Linux:

func main() {
    rand.Seed(time.Now().Unix())
    for i := 0; i < 1000; i   {
        _ = uniq(genSlice())
        //_ = uniq2(genSlice())
    }
}
func genSlice() []int {
    x := make([]int, 0, 1000)
    for num := 1; num <= 10; num   {
        amount := rand.Intn(1000)
        for i := 0; i < amount; i   {
            x = append(x, num)
        }
    }
    return x
}

$ go build uniq.go
$ time ./uniq

uniq usually takes 5--6 seconds to finish. while uniq2 is more than two times slower, taking between 12--15 seconds.

Why is uniq2, where I save the slice length to a variable, so much slower than uniq, where I directly call len?

Shouldn't it slightly faster?

CodePudding user response：

You expect roughly the same execution time because you think they do roughly the same thing.

The only difference between the two functions is that in uniq2, instead of slicing x and directly accessing len(x) each time, I save len(x) to a variable l and decrement it whenever I shift the slice.

This is wrong.

The first version does:

        copy(x[i:], x[i 1:])
        x = x[:len(x)-1]

And second does:

        copy(x[i:], x[i 1:])
        l--

The first difference is that the first assigns (copies) a slice header which is a reflect.SliceHeader value, being 3 integer (24 bytes on 64-bit architecture), while l-- does a simple decrement, it's much faster.

But the main difference does not stem from this. The main difference is that since the first version changes the x slice (the header, the length included), you end up copying less and less elements, while the second version does not change x and always copies to the end of the slice. x[i 1:] is equivalent to x[x 1:len(x)].

To demonstrate, imagine you pass a slice with length=10 and having all equal elements. The first version will copy 9 elements first, then 8, then 7 etc. The second version will copy 9 elements first, then 9 again, then 9 again etc.

Let's modify your functions to count the number of copied elements:

func uniq(x []int) []int {
    count := 0
    i := 0
    for i < len(x)-1 {
        if x[i] == x[i 1] {
            count  = copy(x[i:], x[i 1:])
            x = x[:len(x)-1]
        } else {
            i  
        }
    }
    fmt.Println("uniq copied", count, "elements")
    return x
}

func uniq2(x []int) []int {
    count := 0
    i := 0
    l := len(x)
    for i < l-1 {
        if x[i] == x[i 1] {
            count  = copy(x[i:], x[i 1:])
            l--
        } else {
            i  
        }
    }
    fmt.Println("uniq2 copied", count, "elements")
    return x[:l]
}

Testing it:

uniq(make([]int, 1000))
uniq2(make([]int, 1000))

Output is:

uniq copied 499500 elements
uniq2 copied 998001 elements

uniq2() copies twice as many elements!

If we test it with a random slice:

uniq(genSlice())
uniq2(genSlice())

Output is:

uniq copied 7956671 elements
uniq2 copied 11900262 elements

Again, uniq2() copies roughly 1.5 times more elements! (But this greatly depends on the random numbers.)

Try the examples on the Go Playground.

The "fix" is to modify uniq2() to copy until l:

        copy(x[i:], x[i 1:l])
        l--

With this "appropriate" change, performance is roughly the same.