Home > Enterprise >  Pandas sline interpolation wrong?
Pandas sline interpolation wrong?

Time:02-04

Pandas (Version 1.3.5) and SciPy (Version 1.7.3) give different result for spline interpolation and from my understanding pandas is wrong:

df = pd.DataFrame(data = {'values': [10, 12, 15, None, None, None, None, 10, 5, 1, None, 0, 1, 3],})
df['interpolated_pandas'] = df['values'].interpolate(method='spline', axis=0, order=3)
df[['interpolated_pandas', 'values']].plot.line();

gives me: enter image description here

And

idx = ~df['values'].isna()
f = interpolate.interp1d(df[idx].index, df.loc[idx,'values'], kind=3) # kind: an integer specifying the order of the spline interpolator to use
df['interpolated_scipy'] = f(df.index)
df[['interpolated_scipy', 'values']].plot.line();

gives me: enter image description here

Is there something wrong in my code or is my understanding wrong? Or is this an actual bug in Pandas?

CodePudding user response:

A spline depends on a knot sequence. So my first guess would be that the two functions internally use different default knot locations.

CodePudding user response:

Pandas uses UnivariateSpline which by default uses a "smoothing factor used to choose the number of knots", see pandas code and scipy doc. To achieve same results, we need add s=0 in the function call:

df['interpolated_pandas'] = df['values'].interpolate(method='spline', axis=0, order=3) # default with smoothing factor
df['interpolated_pandas_s0'] = df['values'].interpolate(method='spline', axis=0, order=3, s=0) # without smoothing factor and same as `interp1d`
  • Related