Home > Back-end >  `sort.Slice` order is non-deterministic
`sort.Slice` order is non-deterministic

Time:11-25

I'm trying to use sort.Slice from the Go standard library to sort a slice of strings. I want them sorted alphabetically, except I want the empty string to appear after all other strings (hence I can't just use sort.Strings).

For the less function, I thought this would work:

func(i, j int) bool {
    return s[j] == "" || s[i] < s[j]
}

However, I seem to be getting random answers depending on what the input order is. Here's a MWE:

package main

import (
    "fmt"
    "math/rand"
    "sort"
    "time"
)

func main() {
    s := []string{"", "foo", "bar", "baz"}

    rand.Seed(time.Now().Unix())
    rand.Shuffle(len(s), func(i, j int) {
        s[i], s[j] = s[j], s[i]
    })
    fmt.Printf("%q\n", s)

    sort.Slice(s, func(i, j int) bool {
        return s[j] == "" || s[i] < s[j]
    })
    fmt.Printf("%q\n", s)
}

and here's the output from running that a few times:

$ go run ./z
["" "foo" "baz" "bar"]
["bar" "baz" "foo" ""]
$ go run ./z
["baz" "" "foo" "bar"]
["bar" "" "baz" "foo"]
$ go run ./z
["bar" "foo" "" "baz"]
["" "bar" "baz" "foo"]
$ go run ./z
["bar" "foo" "baz" ""]
["" "bar" "baz" "foo"]

CodePudding user response:

This is because your less() function isn't saying what you want it to say.

You said you want empty strings to be sorted after all non-empty strings. Your logic:

return s[j] == "" || s[i] < s[j]

This does tell if the second is "", then the first is less. This is more or less correct (except if both are empty, "is-less" is not really true: they are equal). But what if the first is "" and the second isn't? Then your function should return false but instead it returns s[i] < s[j]. If the second isn't empty, this will be true, telling "" is less than the other, exactly the opposite what you want.

The correct "is-less" relation is like this:

sort.Slice(s, func(i, j int) bool {
    if s[j] == "" && s[i] != "" {
        return true
    }
    if s[i] == "" && s[j] != "" {
        return false
    }
    return s[i] < s[j]
})

If only the second is "", you want the first to be less. If only the first is empty, you want it "not be less". Else use normal order (which is byte-wise).

Try it on the Go Playground.

Note that if both the first and second values would be empty, this function will return false because "" is not less than "" (they are equal). This is the proper value to return, although returning true here would still result in correct order (swapping empty elements would result in same result), but this may result in fewer swaps.

Transforming the logic using XOR

Note that in the custom logic there is deviation from the normal order if only one of the strings is empty. This is the logical XOR (Exclusive OR) relation: a XOR b is true if only a or only b is true. In Go there is no logical XOR operator, but a XOR b is equivalent to a != b.

If one empty string is "detected", the result is true if the second one is the empty (else false). So we could apply this identity transformation to our logic:

sort.Slice(s, func(i, j int) bool {
    // Move empty elements to the end:
    if (s[i] == "") != (s[j] == "") { // If only one is empty
        return s[j] == ""
    }
    return s[i] < s[j]
})

This is shorter and probably more efficient, but as you can see, it's harder to understand. Use this only if performance does matter. Try this one on the Go Playground.

  • Related