I would like to concurrently perform an operation on the elements of a slice
I am using the sync/errgroup package to handle concurrency
Here is a minimal reproduction on Go Playground https://go.dev/play/p/yBCiy8UW_80
import (
"fmt"
"golang.org/x/sync/errgroup"
)
func main() {
eg := errgroup.Group{}
input := []int{0, 1, 2}
output1 := []int{}
output2 := make([]int, len(input))
for i, n := range input {
eg.Go(func() (err error) {
output1 = append(output1, n 1)
output2[i] = n 1
return nil
})
}
eg.Wait()
fmt.Printf("with append % v", output1)
fmt.Println()
fmt.Printf("with make % v", output2)
}
outputs
with append [3 3 3]
with make [0 0 3]
versus expected [1 2 3]
CodePudding user response:
There is no guarantee of the order when you run some go routines. So while it's okay to expect the elements 1
, 2
, 3
, you shouldn't make any assumption about the order.
Anyway, it looks like the first eg.Go()
call happens when for for
loop has actually reached it's third element. This is why you only get 3
, and with index access only at the 3rd position (where i=2
).
If you copy your values like this, the problem is somewhat fixed:
for i, n := range input {
nc, ic := n, i
eg.Go(func() (err error) {
output1 = append(output1, nc 1)
output2[ic] = nc 1
return nil
})
}
That said, the result looks like
with append [3 2 1]
with make [1 2 3]
for me, so the order still isn't we might have expected.
I'm no expert on the errgroup
package, though, so maybe somebody else can share more information about the order of execution.
CodePudding user response:
You have two separate issues going on here:
First, the variables in your loop are changing before each goroutine gets a chance to read them. When you have a loop like
for i, n, := range input {
// ...
}
the variables i
and n
live for the whole duration of the loop. When control reaches the bottom of the loop and jumps back up to the top, those variables get assigned new values. If a goroutine started in the loop is using those variables, then their value will change unpredictably. This is why you are seeing the same number show up multiple times in the output of your example. The goroutine started in the first loop iteration doesn't start executing until n
has already been set to 2.
To solve this, you can do what NotX's answer shows and create new variables that are scoped to just a single iteration of the loop:
for i, n := range input {
ic, nc := i, n
// use ic and nc instead of i and n
}
Variables declared inside a loop are scoped to just a single iteration of the loop, so when the next iteration of the loop starts, entirely new variables get created, preventing the originals from changing between when the goroutine is launched and when it actually starts running.
Second you are concurrently modifying the same value from different goroutines, which isn't safe. In particular, you're using append
to append to the same slice concurrently. What happens in this case is undefined and all kinds of bad things could happen.
There are two ways to deal with this. The first one you already have set up: pre-allocate an output slice with make
and then have each goroutine fill in a specific position in the slice:
output := make([]int, 3)
for i, n := range input {
ic, nc := i, n
eg.Go(func() (err error) {
output[ic] = nc 1
return nil
})
}
eg.Wait()
This works great if you know how many outputs you're going to have when you start the loop.
The other option is to use some kind of locking to control access to the output slice. sync.Mutex
works great for this:
var output []int
mu sync.Mutex
for _, n := range input {
nc := n
eg.Go(func() (err error) {
mu.Lock()
defer mu.Unlock()
output = append(output, nc 1)
return nil
})
}
eg.Wait()
This works if you don't know how many outputs you have, but it doesn't guarantee anything about the order of the output - it could be in any order. If you want to put it in order, you can always do some kind of sort after all the goroutines finish.