df: (DataFrame)
Open High Close Volume
2020/1/1 1 2 3 323232
2020/1/2 2 3 4 321321
....
2020/12/31 4 5 6 123213
....
2021
The performance i needed is : (Graph NO.1)
Open High Close Volume Year_Sum_Volume
2020/1/1 1 2 3 323232 (323232 321321 .... 123213)
2020/1/2 2 3 4 321321 (323232 321321 .... 123213)
....
2020/12/31 4 5 6 123213 (323232 321321 .... 123213)
....
2021 (x x x.....x)
I want a sum of Volume in different year (the Year_Sum_Volume is the volume of each year) This is the code i try to calculate the sum of volume in each year but how can i add this data to daily data , i want to add Year_Sum_Volume to df,like(Graph no.1)
df.resample('Y', on='Date')['Volume'].sum()
thanks you for answering
CodePudding user response:
I believe groupby.sum()
and merge should be your friends
import pandas as pd
df = pd.DataFrame({"date":['2021-12-30', '2021-12-31', '2022-01-01'], "a":[1,2.1,3.2]})
df.date = pd.to_datetime(df.date)
df["year"] = df.date.dt.year
df_sums = df.groupby("year").sum().rename(columns={"a":"a_sum"})
df = df.merge(df_sums, right_index=True, left_on="year")
which gives:
date | a | year | a_sum | |
---|---|---|---|---|
0 | 2021-12-30 00:00:00 | 1 | 2021 | 3.1 |
1 | 2021-12-31 00:00:00 | 2.1 | 2021 | 3.1 |
2 | 2022-01-01 00:00:00 | 3.2 | 2022 | 3.2 |
CodePudding user response:
Based on your output, Year_Sum_Volume is the same value for every row and can be calculated using df['Volume'].sum()
.
Then you join a column of a scaled list:
df.join(pd.DataFrame( {'Year_Sum_Volume': [your_sum_val] * len(df['Volume'])} ))
CodePudding user response:
Try below code (after converting date column to pd.to_datetime)
df.assign(Year_Sum_Volume = df.groupby(df['date'].dt.year)['a'].transform('sum'))