Home > OS >  How to multiply 2 matrices using Parallel.ForEach?
How to multiply 2 matrices using Parallel.ForEach?

Time:05-25

There is a function that multiplies two matrices as usual

public IMatrix Multiply(IMatrix m1, IMatrix m2)
        {
            var resultMatrix = new Matrix(m1.RowCount, m2.ColCount);
            for (long i = 0; i < m1.RowCount; i  )
            {
                for (byte j = 0; j < m2.ColCount; j  )
                {
                    long sum = 0;
                    for (byte k = 0; k < m1.ColCount; k  )
                    {
                        sum  = m1.GetElement(i, k) * m2.GetElement(k, j);
                    }

                    resultMatrix.SetElement(i, j, sum);
                }
            }

            return resultMatrix;
        }

This function should be rewritten using Parallel.ForEach Threading, I tried this way

public IMatrix Multiply(IMatrix m1, IMatrix m2)
        {
            // todo: feel free to add your code here

            var resultMatrix = new Matrix(m1.RowCount, m2.ColCount);
            
            Parallel.ForEach(m1.RowCount, row =>
            {
                for (byte j = 0; j < m2.ColCount; j  )
                {
                    long sum = 0;
                    for (byte k = 0; k < m1.ColCount; k  )
                    {
                        sum  = m1.GetElement(row, k) * m2.GetElement(k, j);
                    }

                    resultMatrix.SetElement(row, j, sum);
                }
            });

            return resultMatrix;
        }

But there is an error with the type argument in the loop. How can I fix it?

CodePudding user response:

Just use Parallel.For instead of Parallel.Foreach, that should let you keep the exact same body as the non-parallel version:

Parallel.For(0, m1.RowCount, i =>{
   ...
}

Note that only fairly large matrices will benefit from parallelization, so if you are working with 4x4 matrices for graphics, this is not the approach to take.

One problem with multiplying matrices is that you need to access one value for each row for one of the matrices in your innermost loop. This access pattern may be difficult to cache by your processor, causing lots of cache misses. So a fairly easy optimization is to copy an entire column to a temporary array and do all computations that need this column before reading the next. This lets all memory access be nice and linear and easy to cache. this will do more work overall, but better cache utilization easily makes it a win. There are even more cache efficient methods, but the complexity also tend to increase.

Another optimization would be to use SIMD, but this might require platform specific code for best performance, and will likely involve more work. But you might be able to find libraries that are already optimized.

But perhaps most importantly, Profile your code. It is quite easy to have simple things consume lot of time. You are for example using an interface, so if you may have a virtual method call for each memory access that cannot be inlined, potentially causing a severe performance penalty compared to a direct array access.

CodePudding user response:

ForEach receives a collection, IEnumerable as the first argument and m1.RowCount is a number. Probably Parallel.For() is what you wanted.

  • Related