For odd integers T and S, Array1 and Array2 are both 2 dimensional ndarrays of shape (T, S) which represent samples of functions func1(t, s) and func2(t, s) on an equally spaced lattice L of T x S points stretching from t=0 to t=1 and s=-10 to s=10. I wish to obtain a third Array3 of shape (T, S) which takes values as func3(t, s) on this lattice, where func3(t, s) is the sum of func1(q, p)*func2(t-q, s-p) over all q and p such that both (q, p) and (t-q, s-p) are on the lattice L.
My current working method for doing this is padding Array2 with 0's to create an ExtendedArray2 of shape (2T-1, 2S-1) such that ExtendedArray2[(T-1) x][(S-1) y]=Array2[x][y] for 0 <= x <= T-1 and 0 <= y <= S-1, and takes values of 0 elsewhere. I convolve Array1 and ExtendedArray2 in the following way using signal from scipy to get my desired result.
Array3 = signal.fftconvolve(Array1, ExtendedArray2, mode='valid')
The fftconvolve is reasonably fast, but this method has the drawback that it requires doubling both dimensions of the input arrays, which makes the speed take a hit compared to what it could be on the smaller sized arrays. Is there a way a speed-efficient convolution method could be used on the unpadded arrays to obtain Array3?
CodePudding user response:
No, there is no faster method of computing this than what you are doing, unless there are specific properties of the functions to be exploited (e.g. if one of the functions is separable).
You can write faster code, though. For example, you could use the FFTW library instead of the FFT in SciPy. You should also ensure that ExtendedArray2
is of an easy size to apply the FFT to (i.e. a product of small integers). Instead of choosing the shape (2T-1, 2S-1), make it a bit larger, shape (2T-1 n, 2S-1 m), choosing non-negative n and m so that 2T-1 n and 2S-1 m are both a product of, for example, 2s, 3s and 5s.
CodePudding user response:
For the problem you described I would simply use
Array3 = signal.fftconvolve(Array1, Array2, mode='full')
It will calculate the a fast length for you (checking the code here)
Could you compare and let us know the run times you see.