I've a dataset on the closing price of various stocks for about 5 years:
The different stocks are denoted by "SecuritiesCode". I would like to add a column to denote if there is an upcoming stock split / reverse stock split for a particular stock within the next 30 days (a stock split reverse stock split happens when "AdjustmentFactor" != 1)
I've tried to define a function to do this:
def pending_stocksplit(df):
stock_split_dates = df[df['AdjustmentFactor'] != 1]['Date'].tolist()
for i in stock_split_dates:
df[(df['Date'] >= i - timedelta(days=30)) & (df['Date'] <= i)]['pending_stocksplit'] = 1
return df
df = df.groupby('SecuritiesCode').apply(pending_stocksplit)
However, nothing was changed to my dataframe. What am I doing wrong within my function here?
P.S. This is actually a competition problem from kaggle
CodePudding user response:
I think this will do what you're asking:
import datetime
def pending_stocksplit(df):
#today = datetime.datetime.now()
today = datetime.datetime.strptime('2021-01-01','%Y-%m-%d')
daysAway = df.Date - today
df.loc[(df.AdjustmentFactor != 1) & (daysAway >= datetime.timedelta(days=0)) & (daysAway <= datetime.timedelta(days=30)),'pending_stocksplit'] = 1
return df
df = df.groupby('SecuritiesCode').apply(pending_stocksplit)
Sample input:
today is hardcoded to 2021-01-01
Date SecuritiesCode Close AdjustmentFactor
0 2017-01-04 1301 99.0 2.0
1 2017-01-04 1302 99.0 1.0
2 2017-01-04 1303 99.0 1.0
3 2017-01-04 1304 99.0 1.0
4 2019-01-04 1301 99.0 1.0
5 2019-01-04 1302 99.0 3.0
6 2019-01-04 1303 99.0 1.0
7 2019-01-04 1304 99.0 1.0
8 2021-01-04 1301 99.0 1.0
9 2021-01-04 1302 99.0 1.0
10 2021-01-04 1303 99.0 0.5
11 2021-01-04 1304 99.0 1.0
Output:
Date SecuritiesCode Close AdjustmentFactor pending_stocksplit
0 2017-01-04 1301 99.0 2.0 NaN
1 2017-01-04 1302 99.0 1.0 NaN
2 2017-01-04 1303 99.0 1.0 NaN
3 2017-01-04 1304 99.0 1.0 NaN
4 2019-01-04 1301 99.0 1.0 NaN
5 2019-01-04 1302 99.0 3.0 NaN
6 2019-01-04 1303 99.0 1.0 NaN
7 2019-01-04 1304 99.0 1.0 NaN
8 2021-01-04 1301 99.0 1.0 NaN
9 2021-01-04 1302 99.0 1.0 NaN
10 2021-01-04 1303 99.0 0.5 1.0
11 2021-01-04 1304 99.0 1.0 NaN
CodePudding user response:
I've found the very basic error that I'd made when it came to filtering the dates
Instead of
df[(df['Date'] >= i - timedelta(days=30)) & (df['Date'] <= i)]
it should have been
df[(df['Date'] >= (i - timedelta(days=30))) & (df['Date'] <= i)]