my test code below
type Data struct {
data int
}
func printData(c chan *Data) {
time.Sleep(time.Second * 1)
data := <-c
for {
data.data = 100
}
}
func main() {
fmt.Println("Main started...")
a := Data{data: 1}
b := &a
//create channel
c := make(chan *Data, 10)
go printData(c)
fmt.Println(fmt.Printf("Value of b before putting into channel %v", *b))
c <- b
for {
b.data = 20
}
}
I use go build -race
to build the binary, As expected a data race occurs
My doubt is I think the passing pointer is improper because when using the channel with multiple goroutines, the channel has a lock, but a passing pointer to channel, cause goroutines to need to add another lock to protect data race. It's weird
So my understood is right?
CodePudding user response:
You have a data race because two routines are trying to use the same underlying object without synchronization. You would get this same race without a channel if you stored a pointer to a shared object before creating the goroutine. You would not have a race if, after passing the pointer, you stopped using the underlying object in one of the two goroutines. So it's not the channel itself that is the issue, it is the use of a shared underlying object in two separate goroutines.
The method of sharing is not relevant, in other words. The fact that there is sharing is what is important.
One idea behind CSP is to share by communicating (i.e., send the actual data that need processing) rather than communicating by sharing (i.e., convincing two separate sequential processors that may run concurrently to behave well when when they share the data).
When the data themselves are very large, sending a pointer—and immediately going "hands off" of the underlying data—may be a good strategy, to reduce the cost of the data copy. But this is an optimization technique and should be done only after performance analysis shows that the data-copying is the expensive part of the operation. Adding a pointer indirection adds performance cost, which must be recovered by the reduced communications cost. In your particular case, the data to be operated upon consist of a single int
value, so the added pointer cost greatly exceeds any other reduced cost, and you should only share the data if that's required by the algorithm.