I have the following data frame:
Month | Day | Year | Open | High | Low | Close | Week Close | Week | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 2003 | 46.593 | 46.656 | 46.405 | 46.468 | 45.593 | 1 |
1 | 1 | 2 | 2003 | 46.538 | 46.66 | 46.47 | 46.673 | 45.593 | 1 |
2 | 1 | 3 | 2003 | 46.717 | 46.781 | 46.53 | 46.750 | 45.593 | 1 |
3 | 1 | 4 | 2003 | 46.815 | 46.843 | 46.68 | 46.750 | 45.593 | 1 |
4 | 1 | 5 | 2003 | 46.935 | 47.000 | 46.56 | 46.593 | 45.593 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | |
7257 | 10 | 26 | 2022 | 381.619 | 387.5799 | 381.350 | 382.019 | 389.019 | 43 |
7258 | 10 | 27 | 2022 | 383.07 | 385.00 | 379.329 | 379.98 | 389.019 | 43 |
7259 | 10 | 28 | 2022 | 379.869 | 389.519 | 379.67 | 389.019 | 389.019 | 43 |
7260 | 10 | 31 | 2022 | 386.44 | 388.399 | 385.26 | 386.209 | 385.24 | 44 |
7261 | 11 | 1 | 2022 | 390.14 | 390.39 | 383.29 | 384.519 | 385.24 | 44 |
I want to create a new column titled 'Prior_Week_Close' which will reference the prior week's 'Week Close' value (and the last week of the prior year for the first week of every year). For example, row 7260's value for Prior_Week_Close should equal 389.019
I'm trying:
SPY['prior_week_close'] = np.where(SPY['Week'].shift(1) == (SPY['Week'] - 1), SPY['Week_Close'].shift(1), np.nan)
TypeError: boolean value of NA is ambiguous
I thought about just using shift and creating a new column but some weeks only have 4 days and that would lead to inaccurate values.
Any help is greatly appreciated!
CodePudding user response:
I was able to solve this by creating a new column called 'Overall_Week' (the week number in the entire data set, not just the calendar year) and using the following code:
def fn(s):
result = SPY[SPY.Overall_Week == (s.iloc[0] - 1)]['Week_Close']
if result.shape[0] > 0:
return np.broadcast_to(result.iloc[0], s.shape)
else:
return np.broadcast_to(np.NaN, s.shape)
SPY['Prior_Week_Close'] = SPY.groupby('Overall_Week')['Overall_Week'].transform(fn)```