Interpolate NaN values in a large numpy array-CodePudding

I want to replace all NaN values in a numpy array (63*479060). I referred to this question Interpolate NaN values in a numpy array and tried the following code but it does not give the results of interpolation (because of the large size of the array I think).

a = np.arange(30180780).reshape((63, 479060)).astype(float)
a[np.random.randint(2, size=(63, 479060)).astype(bool)] = np.NaN
x, y = np.indices(a.shape)
interp = np.array(a)
interp[np.isnan(interp)] = griddata(
(x[~np.isnan(a)], y[~np.isnan(a)]), 
a[~np.isnan(a)],                    
(x[np.isnan(a)], y[np.isnan(a)]))

Is there an efficient way to interpolate NaN in such a large array? Thanks a lot.

CodePudding user response：

Interpolation on unstructured meshes turns out to be very expensive. The Scipy code is a bit optimized as it is written in Cython and use the QHull library internally. The algorithm first construct the interpolants by triangulating the input data and then performs a linear barycentric interpolation on each triangle. The computation of the Delaunay triangulation (running in O(n log n) time) is very slow in this case despite the use of a specialized native C library: nearly all the time is computing it.

The code executed by QHull is appear to be clearly sub-optimal as it is sequential, not vectorized using SIMD instructions and binaries do not benefit from FMA instruction sets. It is also generic: not specifically optimized for the 2D case. An optimized specific implementation can certainly be much faster but it is hard/tedious to implement efficiently (even for quite skilled developers). *Recompiling the QHull library with more aggressive compiler optimizations should certainly help (like -O3 and -march=native).

Another possible optimization consists in splitting the space in N parts and perform the linear interpolation on each part independently in N separate threads. This can be faster because SciPy disable the Global Interpreter Lock (GIL) when doing this computation and the GIL is usually what prevent threads to speed compute-bound operations. That being said, this is not easy to split the space correctly because some point can be on the boundary. In practice, one need to include an additional ghost area in each parts of the unstructured mesh to do that correctly (which is unfortunately not trivial to do).

Another solution consists in using approximations. Indeed, you can find the K neast point using a Ball-Tree algorithm (implemented in ScipPy) and then perform a linear interpolation based on the gathered points.

Finally, a last solution consist in reimplementing the SciPy method using possibly more optimized libraries like CGAL which is known to be quite fast (there is a Python binding but I am not sure about its performance). It can easily compute the triangulation of the unstructured mesh (which should take few seconds if optimized). Then, one can match the facets with the points using a K-D tree. That being said, CGAL appears to supports interpolations directly.