Home > Net >  How do I create a new column that references other row's data for its values?
How do I create a new column that references other row's data for its values?

Time:11-03

I have the following data frame:

Month Day Year Open High Low Close Week
0 1 1 2003 46.593 46.656 46.405 46.468 1
1 1 2 2003 46.538 46.66 46.47 46.673 1
2 1 3 2003 46.717 46.781 46.53 46.750 1
3 1 4 2003 46.815 46.843 46.68 46.750 1
4 1 5 2003 46.935 47.000 46.56 46.593 1
... ... ... ... ... ... ... ... ...
7257 10 26 2022 381.619 387.5799 381.350 382.019 43
7258 10 27 2022 383.07 385.00 379.329 379.98 43
7259 10 28 2022 379.869 389.519 379.67 389.019 43
7260 10 31 2022 386.44 388.399 385.26 386.209 44
7261 11 1 2022 390.14 390.39 383.29 384.519 44

I want to create a new column titled 'week high' which will reference each week every year and pull in the high. So for Week 1, Year 2003, it will take the Highest High from rows 0 to 4 but for Week 43, Year 2022, it will take the Highest High from rows 7257 to 7259.

Is it possible to reference the columns Week and Year to calculate that value? Thanks!

CodePudding user response:

Assuming , create a weekly period and use it as grouper for transform('max'):

group = pd.to_datetime(df[['Year', 'Month', 'Day']]).dt.to_period('W')
# or, if you already have a "Week" column
# group = "Week"


df['week_high'] = df.groupby(group)['High'].transform('max')

Output:

      Month  Day  Year     Open      High      Low    Close  Week  week_high
0         1    1  2003   46.593   46.6560   46.405   46.468   1.0     47.000
1         1    2  2003   46.538   46.6600   46.470   46.673   1.0     47.000
2         1    3  2003   46.717   46.7810   46.530   46.750   1.0     47.000
3         1    4  2003   46.815   46.8430   46.680   46.750   1.0     47.000
4         1    5  2003   46.935   47.0000   46.560   46.593   1.0     47.000
7257     10   26  2022  381.619  387.5799  381.350  382.019  43.0    389.519
7258     10   27  2022  383.070  385.0000  379.329  379.980  43.0    389.519
7259     10   28  2022  379.869  389.5190  379.670  389.019  43.0    389.519
7260     10   31  2022  386.440  388.3990  385.260  386.209  44.0    390.390
7261     11    1  2022  390.140  390.3900  383.290  384.519    44    390.390

CodePudding user response:

I am assuming you are using pandas. Other libraries will work similar.

Create a new DataFrame aggregated per week using groupby and join it back to your original DataFrame

df_grouped = df["Week", "High"].groupby("Week").max().rename(columns={"High":"Highest High"}
df_result = df.join(df_grouped, "Week")
  • Related