Lock free iterating / indexing jagged arrays-CodePudding

For parallel work on an acceleration data structure I currently use a SpinLock but would like to design the algorithm lock free. The data structure is a jagged array where each inner array has a different size. The working threads should fetch the next element in the inner array, increment the index and, if the index is larger, switch to the next index in the outer array:

for(int i = 0; i < arr.Length;   i)
{
  for(int j = 0; j < arr[i].Length;   j)
  {
    DoWork(arr[i][j]);
  }
}

I can't think of a way to do this lock free except to increment a shared index and then sum up the lengths of the arrays:

int sharedIndex = -1;

// -- In the worker thread ---------------------
bool loop = false;
        
do
{
  int index = Interlocked.Increment(ref sharedIndex);
  int count = 0;
  loop = false;
            
  for(int i = 0; i < arr.Length;   i)
  {
    count  = arr[i].Length;
                                
    if(count > index)
    {
      var remaining = index - (count - arr[i].Length);
      DoWork(arr[i][remaining]);
      loop = true;
      break;
    }
  }
} while(loop);

Is there a way to not have to loop over the entire outer array and still remain lock free? Because I can't increment two indexes at the same time (for the outer and inner index).

CodePudding user response：

Can you divide up work by having each thread do one to four outer iterations between synchronization steps? If outer_size / chunk_size / threads is at least 4 or so (or maybe greater than the expected ratio between your shortest and longest inner arrays), scheduling of work is dynamic enough that you should usually avoid having one thread running for a long time on a very long array while the other threads have all finished.

That might still be a risk if the very last inner array is longer than the others. Depending on how common that is, and/or how important it is to avoid that worst-case scenario, you might look at the inner sizes ahead of time and sort or partition them to start working on the longest inner arrays first, so at the end the differences between threads finishing are the differences in lengths of the shorter arrays. (e.g. real-time where limiting the worst case is more important than speeding up the average, vs. a throughput-oriented use-case. Also if there's anything useful for other threads to be doing with free CPU cores if you don't schedule this perfectly.)

Atomically incrementing a shared counter for every inner element would serialize all threads on that, so unless processing each inner element was very expensive, it would be much slower than single-threaded without synchronization.

I'm assuming you don't need to start work on each element in sequential order, since even a shared counter wouldn't guarantee that (a thread could sleep after incrementing, with another thread starting the element after).

CodePudding user response：

You could optimize your current algorithm by doing binary search of a precomputed array, that contains the accumulated length of all the arrays up to this index. So for example if you have a jagged array of 10 inner arrays with lengths 8, 9, 5, 4, 0, 0, 6, 4, 4, 7, then the precomputed array will contain the values 0, 8, 17, 22, 26, 26, 26, 32, 36, 40. Doing a binary search will get you directly to the inner array that corresponds to the index that you are searching for, doing only O(Log n) comparisons.

Here is an implementation of this idea:

// --- Preparation ------------------------------
int[] indices = new int[arr.Length];
indices[0] = 0;
for (int i = 1; i < arr.Length; i  )
    indices[i] = indices[i - 1]   arr[i - 1].Length;

int sumLength = arr.Sum(inner => inner.Length);

int sharedIndex = -1;

// --- In the worker thread ---------------------
while (true)
{
    int index = Interlocked.Increment(ref sharedIndex);
    if (index >= sumLength) break;
    int outerIndex = Array.BinarySearch(indices, index);
    if (outerIndex < 0) outerIndex = (~outerIndex) - 1;
    while (arr[outerIndex].Length == 0) outerIndex  ; // Skip empty arrays
    int innerIndex = index - indices[outerIndex];
    DoWork(arr[outerIndex][innerIndex]);
}