The code i have posted below is a minimum reproducible version as I have been trying to isolate the problem. I am coming from Python and need to rewrite this script in Go for performance reasons, particularly using parallelization that i have removed from the example.
The problem is I pass N values to the sorting function and get >N return values. It creates a new slice for each iteration in the first outer loop and seems to ignore if !message1.Grouped
condition. I do not have much experience with Go and have this working with Python. I am assuming it has something to do with setting message2.Grouped = true
being not seen by the outer loop for whatever reason.
ultimately im trying to ignore 'messages' that have already been grouped earlier in the loop.
side note: i know the random in this script is not working because i have not set a new seed but that is besides the point and is not part of my actual script
package main
import (
"fmt"
"math/rand"
)
type (
BoolInt struct {
Val int
Grouped bool
}
)
func sort_chunk_no_p(chunk []BoolInt) [][]BoolInt {
COSINE_THRESHOLD := 0.90
allGroups := [][]BoolInt{}
for i, message1 := range chunk {
if !message1.Grouped {
message1.Grouped = true
tempGroup := []BoolInt{message1}
for _, message2 := range chunk[i 1:] {
if !message2.Grouped {
if rand.Float64() >= COSINE_THRESHOLD {
message2.Grouped = true
tempGroup = append(tempGroup, message2)
}
}
}
allGroups = append(allGroups, tempGroup)
}
}
return allGroups
}
func main() {
lo, hi := 1, 100
allMessages := make([]BoolInt, hi-lo 1)
for i := range allMessages {
allMessages[i].Val = i lo
allMessages[i].Grouped = false
}
sorted_chunk := sort_chunk_no_p(allMessages)
fmt.Println(sorted_chunk)
sum := 0
for _, res := range sorted_chunk {
sum = len(res)
}
fmt.Println(sum)
}
CodePudding user response:
When you iterate over a slice, all elements are copied into a single, reused loop variable. This means if you modify fields of this copy, the elements in the slice are not affected.
Either store pointers in the slice (elements will still be copied, but elements now will be pointers pointing to the same struct value), or modify elements via an index expression such as chunk[i].Grouped = true
.
Using pointers this is how it would look like:
func sort_chunk_no_p(chunk []*BoolInt) [][]*BoolInt {
COSINE_THRESHOLD := 0.90
allGroups := [][]*BoolInt{}
for i, message1 := range chunk {
if !message1.Grouped {
message1.Grouped = true
tempGroup := []*BoolInt{message1}
for _, message2 := range chunk[i 1:] {
if !message2.Grouped {
if rand.Float64() >= COSINE_THRESHOLD {
message2.Grouped = true
tempGroup = append(tempGroup, message2)
}
}
}
allGroups = append(allGroups, tempGroup)
}
}
return allGroups
}
And calling it:
allMessages := make([]*BoolInt, hi-lo 1)
for i := range allMessages {
allMessages[i] = &BoolInt{Val: i lo}
}
sorted_chunk := sort_chunk_no_p(allMessages)
Try it on the Go Playground.
See related:
Register multiple routes using range for loop slices/map
Why do these two for loop variations give me different behavior?