Home > database >  Can I reference a prior row's value and populate it in the current row in a new column?
Can I reference a prior row's value and populate it in the current row in a new column?

Time:11-08

I have the following data frame:

Month Day Year Open High Low Close Week Close Week
0 1 1 2003 46.593 46.656 46.405 46.468 45.593 1
1 1 2 2003 46.538 46.66 46.47 46.673 45.593 1
2 1 3 2003 46.717 46.781 46.53 46.750 45.593 1
3 1 4 2003 46.815 46.843 46.68 46.750 45.593 1
4 1 5 2003 46.935 47.000 46.56 46.593 45.593 1
... ... ... ... ... ... ... ... ...
7257 10 26 2022 381.619 387.5799 381.350 382.019 389.019 43
7258 10 27 2022 383.07 385.00 379.329 379.98 389.019 43
7259 10 28 2022 379.869 389.519 379.67 389.019 389.019 43
7260 10 31 2022 386.44 388.399 385.26 386.209 385.24 44
7261 11 1 2022 390.14 390.39 383.29 384.519 385.24 44

I want to create a new column titled 'Prior_Week_Close' which will reference the prior week's 'Week Close' value (and the last week of the prior year for the first week of every year). For example, row 7260's value for Prior_Week_Close should equal 389.019

I'm trying: SPY['prior_week_close'] = np.where(SPY['Week'].shift(1) == (SPY['Week'] - 1), SPY['Week_Close'].shift(1), np.nan)

TypeError: boolean value of NA is ambiguous

I thought about just using shift and creating a new column but some weeks only have 4 days and that would lead to inaccurate values.

Any help is greatly appreciated!

CodePudding user response:

I was able to solve this by creating a new column called 'Overall_Week' (the week number in the entire data set, not just the calendar year) and using the following code:

def fn(s):

    result = SPY[SPY.Overall_Week == (s.iloc[0] - 1)]['Week_Close']


    if result.shape[0] > 0:
        return np.broadcast_to(result.iloc[0], s.shape)
    else:
        return np.broadcast_to(np.NaN, s.shape)
    
SPY['Prior_Week_Close'] = SPY.groupby('Overall_Week')['Overall_Week'].transform(fn)```
  • Related