My data is randomly spaced (x, y, z, d) and I need to interpolate it on 3D grid. The problem is that the grid is pretty big: the total number of points is nx*ny*nz = 345*188*1501 = 97 354 860
.
I tried to use Rbf interpolation while passing all points at once but I get error:
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 343. GiB for an array with shape (97354860, 473) and data type float64
(What is 473 here?)
As an example of what I tried:
from scipy.interpolate import Rbf
rng = np.random.default_rng()
x, y, z, d = rng.random((4, 50))
rbfi = Rbf(x, y, z, d)
# xi, yi, zi alltogether take about 3 Gb
xi = yi = zi = np.linspace(0, 1, 97354860)
di = rbfi(xi, yi, zi) # here I get error
But then I tried to interpolate point-by-point in loop and it seem to work even slow enough (it takes about 20 minuts to interpolate it):
for i in range(0, xi.shape[0]):
di[i] = rbfi(xi[i], yi[i], zi[i])
So the question: what is the difference between interpolating in loop and all points at once? I thought that Rbf should internally interpolate input data points on each requsted point (xi, yi, zi) using loop.
Also I think there should not be difference in interpolation result if I pass all points at once or I interpolate point-by-point.
CodePudding user response:
As you are mentioning, when you are using NumPy, you are working on the all data volume at once, which will be limited by your system ram size. But when you are using loops, that workload is not applied on the rams due to iterations and direct assignment. In this regard, I think using chunks can be very efficient; I.e. dividing the total size to some parts by Chunks. Then, you can use arrays by looping on the parts and pass ram limitations. So, the proposed way will be something like:
chunk_val = 2000000 # it is arbitrary and must be choosed based on the system rams size
chunk = xi.shape[0] // chunk_val
chunk_res = xi.shape[0] % chunk_val
# by array
di = np.array([])
start = 0
for i in range(chunk 1):
di = np.append(di, rbfi(xi[start:(i 1) * chunk_val], yi[start:(i 1) * chunk_val], zi[start:(i 1) * chunk_val]))
start = chunk_val
if i == chunk:
di = np.append(di, rbfi(xi[start:xi.shape[0]], yi[start:xi.shape[0]], zi[start:xi.shape[0]]))
# by list
di = []
start = 0
for i in range(chunk 1):
di.append(rbfi(xi[start:(i 1) * chunk_val], yi[start:(i 1) * chunk_val], zi[start:(i 1) * chunk_val]))
start = chunk_val
if i == chunk:
di.append(rbfi(xi[start:xi.shape[0]], yi[start:xi.shape[0]], zi[start:xi.shape[0]]))
di = [item for sublist in di for item in sublist]
In the aforementioned code, two ways are proposed, one by arrays and another by lists. I have tested this methods in terms of performance, which took around 70 s
on Google Collaboratory.
May be it can be handled by Numba, too, if Numba be compatible with rbfi
dtype. If so, It can be one of the fastest ways and must be evaluated.
Update:
I think the only way is to use chunks. As you can see in rbf
source code, there are series of functions in rbf
related class that will be ran and their data must be stored in a place in the system (rams) until all functions be completed (You can profile each line of the rbf
with profilers such as pprofiler
and … to see which part of the source code consumes the most time, ram, or …). So, AFAIK, when you are importing your data in iterations, the functions operate on a smaller data volume (can be handled by system hardware) and the created temporary arrays and … will be released or overwrited in each loop. So, iterations will not lead to memory leaks or ….