Home > database >  Multi-column interpolation in Python
Multi-column interpolation in Python

Time:06-10

I want to use scipy or pandas to interpolate on a table like this one:

df = pd.DataFrame({'x':[1,1,1,2,2,2],'y':[1,2,3,1,2,3],'z':[10,20,30,40,50,60] })

df = 
   x   y   z
0  1   1  10
1  1   2  20
2  1   3  30
3  2   1  40
4  2   2  50
5  2   3  60

I want to be able to interpolate for a x value of 1.5 and a y value of 2.5 and obtain a 40.

The process would be:

  1. Starting from the first interpolation parameter (x), find the values that surround the target value. In this case the target is 1.5 and the surrounding values are 1 and 2.
  2. Interpolate in y for a target of 2.5 considering x=1. In this case between rows 1 and 2, obtaining a 25
  3. Interpolate in y for a target of 2.5 considering x=2. In this case between rows 4 and 5, obtaining a 55
  4. Interpolate the values form previous steps to the target x value. In this case I have 25 for x=1 and 55 for x=2. The interpolated value for 1.5 is 40

The order in which interpolation is to be performed is fixed and the data will be correctly sorted.

I've found this question but I'm wondering if there is a standard solution already available in those libraries.

CodePudding user response:

You can use scipy.interpolate.interp2d:

    import scipy.interpolate

    f = scipy.interpolate.interp2d(df.x, df.y, df.z)
    f([1.5], [2.5])
     [40.]

The first line creates an interpolation function z = f(x, y) using three arrays for x, y, and z. The second line uses this function to interpolate for z given values for x and y. The default is linear interpolation.

CodePudding user response:

Define your interpolate function:

def interpolate(x, y, df):
    cond = df.x.between(int(x), int(x)   1) & df.y.between(int(y), int(y)   1)
    return df.loc[cond].z.mean()

interpolate(1.5,2.5,df)
 40.0
  • Related