Separating data by date then applying a function-CodePudding

So I have a large table containing 3000 rows, it holds a bunch of trading information such as opening price, closing price, etc.

One of the columns is Datetime and an entry in this column would read: 2022-03-03 09:30:00-05:00. Each row is 5 minutes later than the previous, so in this case the next row after the one I just put would be 2022-03-03 09:35:00-05:00.

I have a function that I created in a separate file that calculates a value called RSI (what this means isn't important):

def RSI(p,q,window=10,signal_type='buy only'):
   
    
    gains = list(filter(lambda x: (x > 0), (p-q)))
    total_gains = sum(gains)
    avg_gain= total_gains/len(gains)
    
    loss = list(filter(lambda x: (x < 0), (p-q)))
    total_loss = (sum(loss))*-1
    avg_loss= total_loss/len(loss)
    
    RS=avg_gain/avg_loss
    rsi=100-(100/(1 RS))
    return (rsi)

This is the code I'm using to access my excel file which holds all the information and the code I'm using to create the summary table:

dat = pd.read_csv('AMD_5m.csv',index_col='Datetime',parse_dates=['Datetime'],
                  date_parser=lambda x: pd.to_datetime(x, utc=True))
dates = backtest.get_dates(dat.index)
#create a summary table
cols = ['Num. Obs.', 'Num. Trade', 'PnL', 'Win. Ratio','RSI'] #add addtional fields if necessary
summary_table = pd.DataFrame(index = dates, columns=cols)

I want to create some sort of code that separates the large dataset by date, so it would gather all the data from, for example, 2022-03-03 and then apply my function to calculate the RSI for that day.

I'll add all the code I've got in relation to my summary table so you can see the sort of format I'm using or if you just need it because I haven't explained something well:

dat = pd.read_csv('AMD_5m.csv',index_col='Datetime',parse_dates=['Datetime'],
                  date_parser=lambda x: pd.to_datetime(x, utc=True))
dates = backtest.get_dates(dat.index)
#create a summary table
cols = ['Num. Obs.', 'Num. Trade', 'PnL', 'Win. Ratio','Avg Opening','Avg Closing','RSI'] #add addtional fields if necessary
summary_table = pd.DataFrame(index = dates, columns=cols)
# loop backtest by dates
for d in dates:
    this_dat = dat.loc[dat.index.date==d]
    #find the number of observations in date d
    summary_table.loc[d]['Num. Obs.'] = this_dat.shape[0] 
    #get trading (i.e. position holding) signals
    signals = backtest.SMA(this_dat['Close'].values, window=10)
    #find the number of trades in date d
    summary_table.loc[d]['Num. Trade'] = np.sum(np.diff(signals)==1) 
    #find PnLs for 100 shares
    shares = 100
    PnL = -shares*np.sum(this_dat['Close'].values[1:]*np.diff(signals))
    if np.sum(np.diff(signals))>0:
        #close position at market close
        PnL  = shares*this_dat['Close'].values[-1]
    summary_table.loc[d]['PnL'] = PnL
    #find the win ratio
    ind_in = np.where(np.diff(signals)==1)[0] 1
    ind_out = np.where(np.diff(signals)==-1)[0] 1
    num_win = np.sum((this_dat['Close'].values[ind_out]-this_dat['Close'].values[ind_in])>0)
    if summary_table.loc[d]['Num. Trade']!=0:
        summary_table.loc[d]['Win. Ratio'] = 1. *num_win/summary_table.loc[d]['Num. Trade']
    dat['gain']=(dat['Close']/dat['Open'])-1
    this_dat['gain']=(this_dat['Close']/this_dat['Open'])-1

    
    summary_table.loc[d]['RSI'] = backtest.RSI(this_dat['Close'],this_dat['Open']) 
    #This only applies the function on one day, not all the days in the dataset#
    
summary_table.to_csv('AMD_5m_results.csv',index=True)

If anyone knows how I could do this please let me know.

(Note: If possible I would like to keep everything as is, if you need to, for example remove the time part of Datetime then please make a new variable/function to do this since I'm not supposed to change the code I'm provided with, which was everything up to for d in dates)

CodePudding user response：

I am not sure if I understood your problem correctly, but you can extract the Date of a Timestamp like this in a new column:

df['dates'] = df['Datetime'].dt.date

If you then want to apply a function for each day you can groupby this column:

df_result = df.groupby(['date']).apply(RSI,p=42,q=42)

Hope this helps :D