How to get prior close when you have all stocks in a single DF?-CodePudding

Sorry for the noob question. I have a bunch of stocks in a sqlite3 database:

import pandas as pd
import sqlite3, config

connection = sqlite3.connect(config.db_file)
connection.row_factory = sqlite3.Row


df = pd.read_sql('SELECT * FROM stock_price', connection)

# sort the dataframe
df.sort_values(by='stock_id', inplace=True)

# # set the index to be this and don't drop
df.set_index(keys=['stock_id'], drop=False,inplace=True)

When I print the df, it gives me the following (where each stock_id refers to a unique stock, e.g APPL):

           id  stock_id        date    open     high     low    close   volume
stock_id                                                                      
1           1         1  2022-08-02    9.83    9.845    9.83    9.830   584772
1           2         1  2022-08-03    9.84    9.860    9.84    9.820     7711
4           3         4  2022-08-03   10.38   10.380   10.38   10.380      199
5          46         5  2022-08-03   34.75   35.200   34.75   35.200     1007
5          45         5  2022-08-02   34.32   34.550   34.32   34.442     1252
...       ...       ...         ...     ...      ...     ...      ...      ...
98          8        98  2022-08-02   28.00   28.095   27.90   28.000     2417
99         71        99  2022-08-02   88.19   88.940   87.15   88.370  1045596
99         72        99  2022-08-03   88.34   88.550   87.65   88.410   982710
100       171       100  2022-08-02  117.58  120.010  117.08  119.270    67795
100       172       100  2022-08-03  119.80  121.940  120.60  121.440     4237

[178 rows x 8 columns]

I need to target each unique stock_id individually, and get the prior close.

I know if each stock was in its own separate dataframe, I could do something like this:

final_df['previous close'] = final_df['c'].shift()

But when I've tried that, because everything in one dataframe, then you get one stock getting the previous close of an entirely different stock, which isn't what I want.

So my question:

What's the best to achieve splitting out all these different stocks from one single dataframe and being able to target them individually, and get the previous close price of each stock?

CodePudding user response：

How about shift the date and merging, i.e.

# conversion (if not already datetime)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

# help dataframe
df_help = df.copy()
df_help['date']  = df_help['date']   pd.Timedelta(1, 'D')     # today's close is tomorrow's prev_close
df_help.rename(columns={'close' : 'prev_close'}, inplace=True)
df_help = df_help[['stock_id', 'date', 'prev_close']]

# merge
df = pd.merge(left=df, right=df_new, on=['stock_id', 'date'], how='left')

For sample data

df = pd.DataFrame({'stock_id'   : [1,1,2,2,3,3,4,4,5,5,6,6],
                   'date'  : ['2022-08-02', '2022-08-03', '2022-08-02', '2022-08-03', '2022-08-02', '2022-08-03', '2022-08-02', '2022-08-03', '2022-08-02', '2022-08-03', '2022-08-02', '2022-08-03'],
                   'open'  : [25.408, 19.859, 27.801, 9.548,  25.825, 22.746, 11.26,  17.841, 10.848, 4.354, 29.09,  13.561], 
                   'high'  : [38.343, 26.984, 33.553, 30.683, 30.87,  32.342, 20.318, 22.889, 18.122, 8.736, 18.894, 13.561],
                   'close' : [8.81,   23.385, 13.484, 19.834, 21.274, 28.743, 17.734, 20.824, 14.819, 8.736, 12.628,  5.739],})

this yields

stock_id       date    open    high   close  prev_close
0          1 2022-08-02  25.408  38.343   8.810         NaN
1          1 2022-08-03  19.859  26.984  23.385       8.810
2          2 2022-08-02  27.801  33.553  13.484         NaN
3          2 2022-08-03   9.548  30.683  19.834      13.484
4          3 2022-08-02  25.825  30.870  21.274         NaN
5          3 2022-08-03  22.746  32.342  28.743      21.274
6          4 2022-08-02  11.260  20.318  17.734         NaN
7          4 2022-08-03  17.841  22.889  20.824      17.734
8          5 2022-08-02  10.848  18.122  14.819         NaN
9          5 2022-08-03   4.354   8.736   8.736      14.819
10         6 2022-08-02  29.090  18.894  12.628         NaN
11         6 2022-08-03  13.561  13.561   5.739      12.628

CodePudding user response：

If I understand the question correctly, you just want the close of the previous row within each stock group. You can do that easily:

newdf = df.assign(prev_close=df.groupby(level=0)['close'].shift())

or, before setting the index to stock_id:

newdf = df.assign(prev_close=df.groupby('stock_id')['close'].shift())

On your data sample:

>>> newdf
          id   stock_id date         open    high     low     close    volume   prev_close
stock_id                                                                                  
1           1    1       2022-08-02    9.83    9.845    9.83    9.830   584772     NaN    
1           2    1       2022-08-03    9.84    9.860    9.84    9.820     7711    9.83    
4           3    4       2022-08-03   10.38   10.380   10.38   10.380      199     NaN    
5          46    5       2022-08-03   34.75   35.200   34.75   35.200     1007     NaN    
5          45    5       2022-08-02   34.32   34.550   34.32   34.442     1252   35.20    
98          8   98       2022-08-02   28.00   28.095   27.90   28.000     2417     NaN    
99         71   99       2022-08-02   88.19   88.940   87.15   88.370  1045596     NaN    
99         72   99       2022-08-03   88.34   88.550   87.65   88.410   982710   88.37    
100       171  100       2022-08-02  117.58  120.010  117.08  119.270    67795     NaN    
100       172  100       2022-08-03  119.80  121.940  120.60  121.440     4237  119.27

Side note: try to not use inplace=True all over the place. It will cause you trouble in the long run (see e.g. here).

Personally, I would prefer this format:

cleandf = df.assign(
    date=pd.to_datetime(df['date'])
).set_index(['stock_id', 'date']).sort_index()
newdf = cleandf.assign(prev_close=cleandf.groupby('stock_id')['close'].shift())
>>> newdf
                     id   open    high     low     close    volume   prev_close
stock_id date                                                                  
1        2022-08-02    1    9.83    9.845    9.83    9.830   584772      NaN   
         2022-08-03    2    9.84    9.860    9.84    9.820     7711    9.830   
4        2022-08-03    3   10.38   10.380   10.38   10.380      199      NaN   
5        2022-08-02   45   34.32   34.550   34.32   34.442     1252      NaN   
         2022-08-03   46   34.75   35.200   34.75   35.200     1007   34.442   
98       2022-08-02    8   28.00   28.095   27.90   28.000     2417      NaN   
99       2022-08-02   71   88.19   88.940   87.15   88.370  1045596      NaN   
         2022-08-03   72   88.34   88.550   87.65   88.410   982710   88.370   
100      2022-08-02  171  117.58  120.010  117.08  119.270    67795      NaN   
         2022-08-03  172  119.80  121.940  120.60  121.440     4237  119.270