With a dataframe like this:
import pandas as pd
df = pd.DataFrame([
{'key': 1, 'value': 0.4},
{'key': 4, 'value': 0.5},
{'key': 6, 'value': 0.7},
{'key': 10, 'value': 1.3},
{'key': 11, 'value': 1.4},
{'key': 13, 'value': 1.1},
])
df.set_index('key', inplace=True)
I'd like to extract values that are either in the dataframe, or should be interpolated from existing values.
I'm aware of DataFrame.interpolate()
and it's perfect for quickly computing interpolated values for indices with NaN
values. So, an approach could be to add all the indices that aren't already in the index, sort the dataframe by index, interpolate and then extract the values again. Something like:
import numpy as np
new_rows = pd.DataFrame([
{'key': index, 'value': np.nan} for index in indices if index not in df.index
])
new_rows.set_index('key', inplace=True)
result = df.append(new_rows).sort_index().interpolate(method='spline', order=2)
print(result['value'][indices])
Result:
key
3 0.529559
6 0.700000
9 1.073190
12 1.252086
15 1.369036
Name: value, dtype: float64
However, the whole process of creating an additional dataframe, appending it to the original, sorting by index, calling .interpolate()
on the whole result and then extracting the required values seems to be a lot more complication than what I'd expected to find.
Something like:
# fictional, doesn't exist:
result = df.interpolated(indices) # a DataFrame with only the rows for given indices, interpolated as needed
print(result['value'])
Or:
# fictional, doesn't exist:
result = df['value'].interpolated(indices) # perhaps only on a Series
print(result)
Am I missing something obvious and is similar functionality actually available? Or is my approach above actually close to what the best way to do it would be?
After posting, I found a somewhat nicer approach myself, but would still like to hear if someone knows of a more efficient, pythonic or simpler approach:
indices = [3, 6, 9, 12, 15]
def interpolated(df, indices, *args, **kwargs):
for index in indices:
if index not in df.index:
df = df.append(pd.Series(name=index))
return df.sort_index().interpolate(*args, **kwargs).loc[indices]
print(interpolated(df, indices, 'spline', order=2))
CodePudding user response:
You can use scipy's interp1d
:
from scipy.interpolate import interp1d
interp = interp1d(df.index, df, axis=0)
interp([3,6,9])
Output (I duplicated the value
column):
array([[0.46666667, 0.46666667],
[0.7 , 0.7 ],
[1.15 , 1.15 ]])