I want to cache calculation results in a ConcurrentDictionary<TKey,TValue>. Several thread may query the cache for an entry and generate it if it does not exist.
Since GetOrAdd(TKey, Func<TKey,TValue>) is not atomic, I think I should use GetOrAdd(TKey, TValue) with Task<CacheItem>
as TValue
.
So, when a thread wants to query a cache item, it generates a cold task coldTask
, that is a task, which is not started, and potentially generates the the item, calls var cacheTask = cache.GetOrAdd(key, coldTask)
for some key
object, and then checks whether cacheTask
is started or even has a result. If cacheTask
is not started, the calling thread starts the task.
Is this a valid approach in principle?
One problem that remains is that
if(cacheTask.Status == Status.Created)
cacheTask.Start();
is not atomic, so the cacheTask
may be started from another thread, before cacheTask.Start()
is called here.
Is
try {
if(cacheTask.Status == Status.Created)
cacheTask.Start();
} catch {}
a valid workaround?
CodePudding user response:
As I suggested in the comments, I'd use TaskCompletionSource<TResult>
and reference equality to avoid races and unnecessary additional tasks to be scheduled:
var tcs = new TaskCompletionSource<CacheItem>();
var actualTask = theDictionary.GetOrAdd(key, tcs.Task);
if(ReferenceEquals(actualTask, tcs.Task))
{
//Do the actual work here
tcs.SetResult(new CacheItem());
}
return actualTask;
CodePudding user response:
The principle should be fine, to start the task you should be able to do something like:
var newTask = new Task(...);
var dictionaryTask = myDictionary.GetOrAdd(myKey, newTask);
if(dictionaryTask == newTask){
newTask.Start();
}
return await dictionaryTask;
That should ensure that only the thread that created the task starts it.
I would suggest checking out Lazy<T>
since it is somewhat related. I would also suggest doing some bench-marking, since the most appropriate approach will depend on your specific use case. Keep in mind that async/await, or blocking, a task will have some overhead, so it will depend on the cost of generating values, and the frequency this is done at.