I have a dataframe of different stocks and their stock price at different time points that looks like:
Stock_ID Price Date
1 45 2022-11-22
1 43 2022-11-21
1 46 2022-11-20
1 49 2022-11-19
2 81 2021-10-19
2 79 2021-10-13
2 63 2021-10-09
2 74 2021-10-07
2 77 2021-09-30
3 32 2022-03-16
3 39 2022-02-14
3 30 2022-02-01
I want to add a new column Historical_low
that computes the historical minimum of that stock, so the desired outcome looks like
Stock_ID Price Date Historical_low
1 45 2022-11-22 43
1 43 2022-11-21 46
1 46 2022-11-20 49
1 49 2022-11-19 np.nan
2 81 2021-10-19 63
2 79 2021-10-13 63
2 63 2021-10-09 74
2 74 2021-10-07 77
2 77 2021-09-30 np.nan
3 32 2022-03-16 30
3 39 2022-02-14 30
3 30 2022-02-01 np.nan
Here is my attempt:
df.sort_values(['Stock_ID','Date], ascending=[True, False], inplace=True)
df['Historical_low'] = df.gropuby('Stock_ID')['Price'].min()
but that gives me the global minimum of the stock price for a particular stock and it means for some rows I am using information from the future.
Thank you for your help in advance.
CodePudding user response:
IIUC, you can use a shifted groupby.cummin
:
df['Historical_low'] = (df
.sort_values(by='Date', ascending=True)
.groupby('Stock_ID', group_keys=False)['Price']
.apply(lambda s: s.cummin().shift())
)
output:
Stock_ID Price Date Historical_low
0 1 45 2022-11-22 43.0
1 1 43 2022-11-21 46.0
2 1 46 2022-11-20 49.0
3 1 49 2022-11-19 NaN
4 2 81 2021-10-19 63.0
5 2 79 2021-10-13 63.0
6 2 63 2021-10-09 74.0
7 2 74 2021-10-07 77.0
8 2 77 2021-09-30 NaN
9 3 32 2022-03-16 30.0
10 3 39 2022-02-14 30.0
11 3 30 2022-02-01 NaN