Home > Software engineering >  Thread safe caching of calculation results
Thread safe caching of calculation results

Time:03-16

I want to cache calculation results in a ConcurrentDictionary<TKey,TValue>. Several thread may query the cache for an entry and generate it if it does not exist. Since GetOrAdd(TKey, Func<TKey,TValue>) is not atomic, I think I should use GetOrAdd(TKey, TValue) with Task<CacheItem> as TValue. So, when a thread wants to query a cache item, it generates a cold task coldTask, that is a task, which is not started, and potentially generates the the item, calls var cacheTask = cache.GetOrAdd(key, coldTask) for some key object, and then checks whether cacheTask is started or even has a result. If cacheTask is not started, the calling thread starts the task.

Is this a valid approach in principle?

One problem that remains is that

if(cacheTask.Status == Status.Created)
  cacheTask.Start();

is not atomic, so the cacheTask may be started from another thread, before cacheTask.Start() is called here. Is

try {
if(cacheTask.Status == Status.Created)
  cacheTask.Start();
} catch {}

a valid workaround?

CodePudding user response:

As I suggested in the comments, I'd use TaskCompletionSource<TResult> and reference equality to avoid races and unnecessary additional tasks to be scheduled:

var tcs = new TaskCompletionSource<CacheItem>();
var actualTask = theDictionary.GetOrAdd(key, tcs.Task);
if(ReferenceEquals(actualTask, tcs.Task))
{
    //Do the actual work here
    tcs.SetResult(new CacheItem());
}
return actualTask;

CodePudding user response:

The principle should be fine, to start the task you should be able to do something like:

var newTask = new Task(...);
var dictionaryTask = myDictionary.GetOrAdd(myKey, newTask);
if(dictionaryTask  == newTask){
    newTask.Start();
}
return await dictionaryTask;

That should ensure that only the thread that created the task starts it.

I would suggest checking out Lazy<T> since it is somewhat related. I would also suggest doing some bench-marking, since the most appropriate approach will depend on your specific use case. Keep in mind that async/await, or blocking, a task will have some overhead, so it will depend on the cost of generating values, and the frequency this is done at.

  • Related