Home > Back-end >  Assign value for x days prior to a date with groupby
Assign value for x days prior to a date with groupby

Time:06-26

I've a dataset on the closing price of various stocks for about 5 years:

enter image description here

The different stocks are denoted by "SecuritiesCode". I would like to add a column to denote if there is an upcoming stock split / reverse stock split for a particular stock within the next 30 days (a stock split reverse stock split happens when "AdjustmentFactor" != 1)

I've tried to define a function to do this:

def pending_stocksplit(df):
  stock_split_dates = df[df['AdjustmentFactor'] != 1]['Date'].tolist()
  for i in stock_split_dates:
    df[(df['Date'] >= i - timedelta(days=30)) & (df['Date'] <= i)]['pending_stocksplit'] = 1
  return df

df = df.groupby('SecuritiesCode').apply(pending_stocksplit)

However, nothing was changed to my dataframe. What am I doing wrong within my function here?

P.S. This is actually a competition problem from kaggle

CodePudding user response:

I think this will do what you're asking:

import datetime
def pending_stocksplit(df):
    #today = datetime.datetime.now()
    today = datetime.datetime.strptime('2021-01-01','%Y-%m-%d')
    daysAway = df.Date - today
    df.loc[(df.AdjustmentFactor != 1) & (daysAway >= datetime.timedelta(days=0)) & (daysAway <= datetime.timedelta(days=30)),'pending_stocksplit'] = 1
    return df

df = df.groupby('SecuritiesCode').apply(pending_stocksplit)

Sample input:

today is hardcoded to 2021-01-01

         Date  SecuritiesCode  Close  AdjustmentFactor
0  2017-01-04            1301   99.0               2.0
1  2017-01-04            1302   99.0               1.0
2  2017-01-04            1303   99.0               1.0
3  2017-01-04            1304   99.0               1.0
4  2019-01-04            1301   99.0               1.0
5  2019-01-04            1302   99.0               3.0
6  2019-01-04            1303   99.0               1.0
7  2019-01-04            1304   99.0               1.0
8  2021-01-04            1301   99.0               1.0
9  2021-01-04            1302   99.0               1.0
10 2021-01-04            1303   99.0               0.5
11 2021-01-04            1304   99.0               1.0

Output:

         Date  SecuritiesCode  Close  AdjustmentFactor  pending_stocksplit
0  2017-01-04            1301   99.0               2.0                 NaN
1  2017-01-04            1302   99.0               1.0                 NaN
2  2017-01-04            1303   99.0               1.0                 NaN
3  2017-01-04            1304   99.0               1.0                 NaN
4  2019-01-04            1301   99.0               1.0                 NaN
5  2019-01-04            1302   99.0               3.0                 NaN
6  2019-01-04            1303   99.0               1.0                 NaN
7  2019-01-04            1304   99.0               1.0                 NaN
8  2021-01-04            1301   99.0               1.0                 NaN
9  2021-01-04            1302   99.0               1.0                 NaN
10 2021-01-04            1303   99.0               0.5                 1.0
11 2021-01-04            1304   99.0               1.0                 NaN

CodePudding user response:

I've found the very basic error that I'd made when it came to filtering the dates

Instead of

df[(df['Date'] >= i - timedelta(days=30)) & (df['Date'] <= i)]

it should have been

df[(df['Date'] >= (i - timedelta(days=30))) & (df['Date'] <= i)]
  • Related