I am building an API that can queue up requests to GET an external website and subsequently perform some work by interacting with that API. I am trying to figure out how to avoid duplicate simultaneous go routines.
That is, assume a request comes in for http://www.example.com The routine is launched to handle that URL which could last minutes or even hours. Any number of other requests can come in while this is happening. If the new requests are not already being worked on, they should proceed with their own routine to fulfill.
However, if another request for example.com comes in, I want the thread that request comes in on to block until the previous example.com request is complete and then it may proceed (duplication is fine, if the first task succeeded it will be a quick GET to confirm, if it failed, trying again is fine).
All of the code examples I've found use channels or waitgroups but these concepts all appear to only block one thread. That is, the first example.com thread is waiting on say a channel to return a value, but I can't have example.com request #2 wait on that same channel since the result can only be read from the channel once.
If it makes any difference, I am building a worker pool with 5 workers, and a worker won't be allocated if another worker is already working the example.com request.
I've also considered using a buffer to keep track of the URLs that are currently being worked by the workers, but I wasn't sure how to block on waiting for the URL to be deleted from the buffer. I saw an example using an infinite for loop and a break for when the URL is no longer in the buffer, but that seems like unnecessary CPU abuse (indefinitely doing a for range on the buffer looking for the URL to no longer be present and to then break).
How do I queue these requests up?
CodePudding user response:
If you want to block a goroutine based on the URL, you can implement a scheme that looks like this:
First, keep a map of all URLs being worked on:
var urls = make(map[string]chan struct{})
var urllock = sync.Mutex{}
The following addURL will add a URL being worked on to the urls
map if it is not there, and will return a channel to notify completion of that task. If the URL is already in the map, it will return false with a wait channel.
func addURL(u string) (bool,chan struct{}) {
urllock.Lock()
defer urllock.Unlock()
ret, exists:=urls[u]
if exists {
return true,ret
}
ret=make(chan struct{})
urls[u]=ret
return false,ret
}
When you get a new URL to work on, try to put it on the map. If you can, then work on it:
workOnURL, ch:=addURL(newURL)
if workOnURL {
go func() {
defer removeURL(newURL)
// Work on URL
}()
} else {
<-ch // Wait for the goroutine to finish
// Then, you can try rescheduling the same URL, or do something else
}
With removeURL
, remove the URL from the map, and close the channel, so any goroutine waiting for this to finish can continue:
func removeURL(u string) {
urllock.Lock()
defer urllock.Unlock()
ret:=urls[u]
delete(urls,u)
close(ret)
}