Home > database >  How can I use scipy interp1d with N-D array for x without for loop
How can I use scipy interp1d with N-D array for x without for loop

Time:05-07

How can I use scipy.interpolate.interp1d when my x array is an N-D array, instead of a 1-D array, without using a loop?

The function f from interp1d then needs to be used with numpy.percentile with one of the arrays as an input.

I think there should be a way to do it with a list comprehension or lambda function, but I am still learning these tools.

(Note that this is different than my recent question here because I mixed up the x and y arrays in the posted question, and this problem was not reproducible.)

Problem statement/example:

# a is y in interp1d docs 
a = np.array([97,4809,4762,282,3879,17454,103,2376,40581,])
 
# b is x in interp1d docs 
b = np.array([
        [0.14,0.11,0.29,0.11,0.09,0.68,0.09,0.18,0.5,],
        [0.32,0.25,0.67,0.25,0.21,1.56,1.60, 0.41,1.15,],]
)

Just trying this, below, fails with ValueError: x and y arrays must be equal in length along interpolation axis. The expected return is array(97, 2376). Using median here, but will need to consider 10th, 90th, etc. percentiles.

f = interpolate.interp1d(b, a, axis=0)
f(np.percentile(b, 50, axis=0))

However this, below, works and prints array(97.)

f = interpolate.interp1d(b[0,:], a, axis=0)
f(np.percentile(b[0,:], 50, axis=0))

A loop works, but I am wondering if there is a solution using list comprehensions, lambda functions, or some other technique.

l = []
for _i in range(b.shape[0]):
    _f = interpolate.interp1d(b[_i,:], a, axis=0)
    l.append(_f(np.percentile(b[_i,:], 50, axis=0)))
print(out)
# returns
# [array(97.), array(2376.)]

Efforts:

I understand I can loop through the b array with a list comprehension.

[b[i,:] for i in range(b.shape[0])]
# returns
# [array([0.14, 0.11, 0.29, 0.11, 0.09, 0.68, 0.09, 0.18, 0.5 ]),
# array([0.32, 0.25, 0.67, 0.25, 0.21, 1.56, 1.6 , 0.41, 1.15])]

And I also understand that I can use a list comprehension to create the scipy function f for each dimension in b:

[interpolate.interp1d(b[i, :], a, axis=0) for i in range(b.shape[0])] 
# returns 
# [<scipy.interpolate.interpolate.interp1d at 0x1b72e404360>,
#  <scipy.interpolate.interpolate.interp1d at 0x1b72e404900>]

But I don't know how to combine these two list comprehensions to apply the np.percentile function.

Using Python 3.8.3, NumPy 1.18.5, SciPy 1.3.2

CodePudding user response:

If you have large data arrays, you want to stay away from for loops, map, np.vectorize and comprehensions. They will all be slow. Instead, it's always better to use vectorized numpy or scipy operations whenever possible.

In this particular case, you can implement the vectorization pretty trivially yourself. interp1d defaults to a linear interpolation, which is very simple to code by hand. For a general interpolator, the first step would be to sort x and y, which is why scipy can't support multiple x for a given y. If the x rows all have different sort order, what do you do with the y?

Luckily, there are a couple of things you can do to make this much faster than having to build a full interpolator or argsort y multiple times. For example, start by argsorting x:

idx = b.argsort(axis=1)

idx is now an array such that b[np.arange(2)[:, None], idx] gives the sorted version of b along axis 1, and also, a[idx] is the corresponding y-values. Since you are taking the median (50th precentile), and the rows have an odd number of elements, the value of x is just the middle of each row, and y is given by

a[idx[:, len(a) // 2]]

If you had an even number of elements, you would have to average the elements surrounding the middle:

i = len(a) // 2 - 1
a[idx[:, i:i   2]].mean(axis=1)

You can reduce algorithmic complexity by using np.argpartition instead of a full-blown np.argsort to get the middle element(s).

CodePudding user response:

interp1d and other interpolators from scipy.interpolate only support 1D x arrays. So you'll need to loop over the dimensions of x manually.

  • Related