Home > Software design >  Historical minimum of timeseries data instead of .min in Panda
Historical minimum of timeseries data instead of .min in Panda

Time:10-06

I have a dataframe of different stocks and their stock price at different time points that looks like:

Stock_ID   Price   Date
1          45      2022-11-22
1          43      2022-11-21 
1          46      2022-11-20
1          49      2022-11-19
2          81      2021-10-19
2          79      2021-10-13
2          63      2021-10-09
2          74      2021-10-07
2          77      2021-09-30
3          32      2022-03-16
3          39      2022-02-14
3          30      2022-02-01

I want to add a new column Historical_low that computes the historical minimum of that stock, so the desired outcome looks like

Stock_ID   Price   Date        Historical_low
1          45      2022-11-22  43
1          43      2022-11-21  46
1          46      2022-11-20  49
1          49      2022-11-19  np.nan
2          81      2021-10-19  63
2          79      2021-10-13  63
2          63      2021-10-09  74
2          74      2021-10-07  77
2          77      2021-09-30  np.nan
3          32      2022-03-16  30
3          39      2022-02-14  30
3          30      2022-02-01  np.nan

Here is my attempt:

df.sort_values(['Stock_ID','Date], ascending=[True, False], inplace=True)
df['Historical_low'] = df.gropuby('Stock_ID')['Price'].min()

but that gives me the global minimum of the stock price for a particular stock and it means for some rows I am using information from the future.

Thank you for your help in advance.

CodePudding user response:

IIUC, you can use a shifted groupby.cummin:

df['Historical_low'] = (df
   .sort_values(by='Date', ascending=True)
   .groupby('Stock_ID', group_keys=False)['Price']
   .apply(lambda s: s.cummin().shift())
)

output:

    Stock_ID  Price       Date  Historical_low
0          1     45 2022-11-22            43.0
1          1     43 2022-11-21            46.0
2          1     46 2022-11-20            49.0
3          1     49 2022-11-19             NaN
4          2     81 2021-10-19            63.0
5          2     79 2021-10-13            63.0
6          2     63 2021-10-09            74.0
7          2     74 2021-10-07            77.0
8          2     77 2021-09-30             NaN
9          3     32 2022-03-16            30.0
10         3     39 2022-02-14            30.0
11         3     30 2022-02-01             NaN
  • Related