How do I create a new column that references other row's data for its values?-CodePudding

I have the following data frame:

	Month	Day	Year	Open	High	Low	Close	Week
0	1	1	2003	46.593	46.656	46.405	46.468	1
1	1	2	2003	46.538	46.66	46.47	46.673	1
2	1	3	2003	46.717	46.781	46.53	46.750	1
3	1	4	2003	46.815	46.843	46.68	46.750	1
4	1	5	2003	46.935	47.000	46.56	46.593	1
...	...	...	...	...	...	...	...	...
7257	10	26	2022	381.619	387.5799	381.350	382.019	43
7258	10	27	2022	383.07	385.00	379.329	379.98	43
7259	10	28	2022	379.869	389.519	379.67	389.019	43
7260	10	31	2022	386.44	388.399	385.26	386.209	44
7261	11	1	2022	390.14	390.39	383.29	384.519	44

I want to create a new column titled 'week high' which will reference each week every year and pull in the high. So for Week 1, Year 2003, it will take the Highest High from rows 0 to 4 but for Week 43, Year 2022, it will take the Highest High from rows 7257 to 7259.

Is it possible to reference the columns Week and Year to calculate that value? Thanks!

CodePudding user response：

Assuming pandas, create a weekly period and use it as grouper for transform('max'):

group = pd.to_datetime(df[['Year', 'Month', 'Day']]).dt.to_period('W')
# or, if you already have a "Week" column
# group = "Week"


df['week_high'] = df.groupby(group)['High'].transform('max')

Output:

      Month  Day  Year     Open      High      Low    Close  Week  week_high
0         1    1  2003   46.593   46.6560   46.405   46.468   1.0     47.000
1         1    2  2003   46.538   46.6600   46.470   46.673   1.0     47.000
2         1    3  2003   46.717   46.7810   46.530   46.750   1.0     47.000
3         1    4  2003   46.815   46.8430   46.680   46.750   1.0     47.000
4         1    5  2003   46.935   47.0000   46.560   46.593   1.0     47.000
7257     10   26  2022  381.619  387.5799  381.350  382.019  43.0    389.519
7258     10   27  2022  383.070  385.0000  379.329  379.980  43.0    389.519
7259     10   28  2022  379.869  389.5190  379.670  389.019  43.0    389.519
7260     10   31  2022  386.440  388.3990  385.260  386.209  44.0    390.390
7261     11    1  2022  390.140  390.3900  383.290  384.519    44    390.390

CodePudding user response：

I am assuming you are using pandas. Other libraries will work similar.

Create a new DataFrame aggregated per week using groupby and join it back to your original DataFrame

df_grouped = df["Week", "High"].groupby("Week").max().rename(columns={"High":"Highest High"}
df_result = df.join(df_grouped, "Week")