Home > Back-end >  How to get the difference between current time and 15s forward in pandas?
How to get the difference between current time and 15s forward in pandas?

Time:09-12

I have a pandas DataFrame, which stores stock price and time, time column's type is pd.datetime.

here is a demo:

import pandas as pd
df = pd.DataFrame([['2022-09-01 09:33:00', 100.], ['2022-09-01 09:33:14', 101.], ['2022-09-01 09:33:16', 99.4], ['2022-09-01 09:33:30', 100.9]], columns=['time', 'price'])
df['time'] = pd.to_datetime(df['time'])

In [11]: df
Out[11]: 
                 time  price
0 2022-09-01 09:33:00  100.0
1 2022-09-01 09:33:14  101.0
2 2022-09-01 09:33:16   99.4
3 2022-09-01 09:33:30  100.9

I want to calculate future return in 15s. (first price after 15 second - current price)

which I want is:

In [13]: df
Out[13]: 
                 time  price  return
0 2022-09-01 09:33:00  100.0    -0.6  // the future price is 99.4, period is 16s
1 2022-09-01 09:33:14  101.0    -0.1  // the future price is 100.9, period is 16s
2 2022-09-01 09:33:16   99.4     NaN
3 2022-09-01 09:33:30  100.9     NaN

I know df.diff can get difference in index, is there any good methods can do this?

CodePudding user response:

merge_asof to the rescue

Subtract a timedelta of 15s from the right dataframe then self merge on time using merge_asof with direction=forward which selects the first row in right dataframe whose on key is greater than or equal to the on key in the left dataframe then subtract the price column to calculate the return

df1 = pd.merge_asof(
    left=df,
    right=df.assign(time=df['time'] - pd.Timedelta('15s')),
    on='time', direction='forward', suffixes=['', '_r']
)

df1['return'] = df1.pop('price_r') - df1['price']

Result

                 time  price  return
0 2022-09-01 09:33:00  100.0    -0.6
1 2022-09-01 09:33:14  101.0    -0.1
2 2022-09-01 09:33:16   99.4     NaN
3 2022-09-01 09:33:30  100.9     NaN

CodePudding user response:

Please, try this (but I don't believe the output is very meaningful :-( ). Is it what you expected? (I realized this code assigned the return for the previous "15" seconds, not the next "15" seconds. But this is how the return is usually indexed - by the time when it is realized, not when it is still expected for the future).

import numpy as np
import pandas as pd

df = pd.DataFrame([['2022-09-01 09:33:00', 100.], ['2022-09-01 09:33:14', 101.], ['2022-09-01 09:33:16', 99.4], ['2022-09-01 09:33:30', 100.9]], columns=['time', 'price'])

df['time'] = pd.to_datetime(df['time'])
df = df.sort_values('time').reset_index(drop=True)

df.loc[:, 'return'] = df['price'].diff()

df['time_diff'] = df['time'].diff()

df['15sec_or_more'] = (df['time_diff'] >= np.timedelta64(15, 's'))

for k, i in enumerate(df.index):
    if k:
        if not df.loc[i,'15sec_or_more']:
            temp = df.iloc[k:].loc[:,['return','time_diff']].cumsum(axis=0)
            conds = (temp['time_diff'] >= np.timedelta64(15, 's'))
            if conds.sum():
                true_return_index = conds.idxmax()
                df.loc[i, 'return'] = df.loc[true_return_index, 'return']
            else:
                df.loc[i, 'return'] = np.nan

df = df[['time', 'price' ,'return']]
print(df)
  • Related