Why is my sorting function returning more values than input-CodePudding

The code i have posted below is a minimum reproducible version as I have been trying to isolate the problem. I am coming from Python and need to rewrite this script in Go for performance reasons, particularly using parallelization that i have removed from the example.

The problem is I pass N values to the sorting function and get >N return values. It creates a new slice for each iteration in the first outer loop and seems to ignore if !message1.Grouped condition. I do not have much experience with Go and have this working with Python. I am assuming it has something to do with setting message2.Grouped = true being not seen by the outer loop for whatever reason. ultimately im trying to ignore 'messages' that have already been grouped earlier in the loop.

side note: i know the random in this script is not working because i have not set a new seed but that is besides the point and is not part of my actual script

package main
import (
    "fmt"
    "math/rand"
)

type (
    BoolInt struct {
        Val int
        Grouped bool
    }
)


func sort_chunk_no_p(chunk []BoolInt) [][]BoolInt {
    COSINE_THRESHOLD := 0.90
    allGroups := [][]BoolInt{}
    for i, message1 := range chunk {
        if !message1.Grouped {
            message1.Grouped = true
            tempGroup := []BoolInt{message1}
            for _, message2 := range chunk[i 1:] {
                if !message2.Grouped {
                    if rand.Float64() >= COSINE_THRESHOLD {
                        message2.Grouped = true
                        tempGroup = append(tempGroup, message2)
                    }   
                }

            }
            allGroups = append(allGroups, tempGroup)
        }
    }
    return allGroups
}

func main() {
    lo, hi := 1, 100
    allMessages := make([]BoolInt, hi-lo 1)
    for i := range allMessages {
        allMessages[i].Val = i   lo
        allMessages[i].Grouped = false
    }

    sorted_chunk := sort_chunk_no_p(allMessages)


    fmt.Println(sorted_chunk)
    sum := 0
    for _, res := range sorted_chunk {
        sum  = len(res)
    }
    fmt.Println(sum)
}

CodePudding user response：

When you iterate over a slice, all elements are copied into a single, reused loop variable. This means if you modify fields of this copy, the elements in the slice are not affected.

Either store pointers in the slice (elements will still be copied, but elements now will be pointers pointing to the same struct value), or modify elements via an index expression such as chunk[i].Grouped = true.

Using pointers this is how it would look like:

func sort_chunk_no_p(chunk []*BoolInt) [][]*BoolInt {
    COSINE_THRESHOLD := 0.90
    allGroups := [][]*BoolInt{}
    for i, message1 := range chunk {
        if !message1.Grouped {
            message1.Grouped = true
            tempGroup := []*BoolInt{message1}
            for _, message2 := range chunk[i 1:] {
                if !message2.Grouped {
                    if rand.Float64() >= COSINE_THRESHOLD {
                        message2.Grouped = true
                        tempGroup = append(tempGroup, message2)
                    }
                }

            }
            allGroups = append(allGroups, tempGroup)
        }
    }
    return allGroups
}

And calling it:

allMessages := make([]*BoolInt, hi-lo 1)
for i := range allMessages {
    allMessages[i] = &BoolInt{Val: i   lo}
}

sorted_chunk := sort_chunk_no_p(allMessages)

Try it on the Go Playground.

See related:

Using Pointers in a for loop

Why do these two for loop variations give me different behavior?