Optimizing array multiplication in fortran-CodePudding

I am trying to optimize the array multiplication of the following type in fortran:

    do i = 1, n
    do j = 1, n
    do k = 1, n
            do i1 = 1, n
            do j1 = 1, n
            do k1 = 1, n
                    B(k1,j1,i1) = B(k1,j1,i1)   &
                    A(k,j,i)*v1(k,k1)*v2(j,j1)*v3(i,i1)
            enddo
            enddo
            enddo
    enddo
    enddo
    enddo

Here A and B are of size (n,n,n) while v1, v2, and v3 are (n,n). It seems that there ought to be vectorization of some sort that would allow me to speed up this evaluation. I tried different ordering of the indices. Is there a faster way of calculating B?

EDIT:

Alright, one can use matmul to obtain about 10x speed up:

    do i = 1, n
    do i1 = 1, n
    do j = 1, n
    do j1 = 1, n
            B(:,j1,i1) = B(:,j1,i1)   &   
            matmul(A(:,j,i),v1)*v2(j,j1)*v3(i,i1)                                             
    enddo
    enddo
    enddo
    enddo

Can we do even better?

CodePudding user response：

You can at the very least reduce two of the loop pairs to matrix multiplication, using either matmul or BLAS/LAPACK, e.g.

do i1=1,n
  do i=1,n
    B(:,:,i1) = B(:,:,i1) &
            &   matmul(matmul(transpose(v1), A(:,:,i)), v2) * v3(i,i1)
  enddo
enddo

You can almost certainly speed this up further by caching the results of the double matmul, e.g.

real :: intermediate(n,n,n)

do i=1,n
  intermediate(:,:,i) = matmul(matmul(transpose(v1), A(:,:,i)), v2)
enddo

do i1=1,n
  do i=1,n
    B(:,:,i1) = B(:,:,i1)   intermediate(:,:,i) * v3(i,i1)
  enddo
enddo

Which might in turn be sped up with more matmul, e.g.

real :: intermediate(n,n,n)

do i=1,n
  intermediate(:,:,i) = matmul(matmul(transpose(v1), A(:,:,i)), v2)
enddo

do j1=1,n
  B(:,j1,:) = matmul(intermediate(:,j1,:), v3))
enddo