Home > database >  Numpy interpolation on pandas TimeStamp data works if it's a pandas series but not if it's
Numpy interpolation on pandas TimeStamp data works if it's a pandas series but not if it's

Time:01-17

I'm trying to use np.interp to interpolate a float value based on pandas TimeStamp data. However, I noticed that np.interp works if the input x is a pandas TimeStamp pandas series, but not if it's a single TimeStamp object.

Here's the code to illustrate this:

import pandas as pd
import numpy as np
coarse = pd.DataFrame({'start': ['2016-01-01 07:00:00.00000 00:00', 
                                      '2016-01-01 07:30:00.00000 00:00',]} )
fine = pd.DataFrame({'start': ['2016-01-01 07:00:02.156657 00:00', 
                                      '2016-01-01 07:00:15 00:00', 
                                      '2016-01-01 07:00:32 00:00',
                                      '2016-01-01 07:11:17 00:00',
                                      '2016-01-01 07:14:00 00:00',
                                      '2016-01-01 07:15:55 00:00',
                                      '2016-01-01 07:33:04 00:00'],
                        'price':     [0,
                                      1,
                                      2,
                                      3,
                                      4,
                                      5,
                                      6,
                        ]} )
coarse['start'] = pd.to_datetime(coarse['start'])
fine['start'] = pd.to_datetime(fine['start'])
np.interp(x=coarse.start, xp=fine.start, fp=fine.price) # works
np.interp(x=coarse.start.iloc[-1], xp=fine.start, fp=fine.price)  # doesn't work

The latter gives the error

TypeError: float() argument must be a string or a number, not 'Timestamp'

I am wondering why the latter doesn't work, while the former does?

CodePudding user response:

The input of interp must be an "array-like" (iterable), you can use .iloc[[-1]]:

np.interp(x=coarse.start.iloc[[-1]], xp=fine.start, fp=fine.price)

Output: array([5.82118562])

CodePudding user response:

Look at what you get when selecting an item from the Series:

In [8]: coarse.start
Out[8]: 
0   2016-01-01 07:00:00 00:00
1   2016-01-01 07:30:00 00:00
Name: start, dtype: datetime64[ns, UTC]

In [9]: coarse.start.iloc[-1]
Out[9]: Timestamp('2016-01-01 07:30:00 0000', tz='UTC')

With the list index, it's a Series:

In [10]: coarse.start.iloc[[-1]]
Out[10]: 
1   2016-01-01 07:30:00 00:00
Name: start, dtype: datetime64[ns, UTC]

I was going to scold you for not showing the full error message, but I see that it's a compiled piece of code that raises the error. Keep in mind that interp is a numpy function, which works with numpy arrays, and for math like this, float dtype ones.

So it's a good guess that interp is trying to make a float array from your argument.

In [14]: np.asarray(coarse.start, dtype=float)
Out[14]: array([1.4516316e 18, 1.4516334e 18])

In [15]: np.asarray(coarse.start.iloc[[1]], dtype=float)
Out[15]: array([1.4516334e 18])

In [16]: np.asarray(coarse.start.iloc[1], dtype=float)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 np.asarray(coarse.start.iloc[1], dtype=float)

TypeError: float() argument must be a string or a number, not 'Timestamp'

It can't make a float value from a Python TimeStamp object.

  • Related