Home > Back-end >  A Yearly data to Daily data in Python
A Yearly data to Daily data in Python

Time:08-03

 df: (DataFrame)
         Open High Close Volume   
2020/1/1  1    2    3     323232
2020/1/2  2    3    4     321321
....
2020/12/31 4   5    6     123213
....
2021

The performance i needed is : (Graph NO.1)

        Open High Close Volume  Year_Sum_Volume
2020/1/1  1   2     3    323232   (323232   321321  ....  123213)
2020/1/2  2   3     4    321321   (323232   321321  ....  123213)
....
2020/12/31 4  5     6    123213   (323232   321321  ....  123213)
....
2021                              (x x x.....x)

I want a sum of Volume in different year (the Year_Sum_Volume is the volume of each year) This is the code i try to calculate the sum of volume in each year but how can i add this data to daily data , i want to add Year_Sum_Volume to df,like(Graph no.1)

df.resample('Y', on='Date')['Volume'].sum()

thanks you for answering

CodePudding user response:

I believe groupby.sum() and merge should be your friends

import pandas as pd
df = pd.DataFrame({"date":['2021-12-30', '2021-12-31', '2022-01-01'], "a":[1,2.1,3.2]})
df.date = pd.to_datetime(df.date)
df["year"] = df.date.dt.year
df_sums = df.groupby("year").sum().rename(columns={"a":"a_sum"})
df = df.merge(df_sums, right_index=True, left_on="year")

which gives:

date a year a_sum
0 2021-12-30 00:00:00 1 2021 3.1
1 2021-12-31 00:00:00 2.1 2021 3.1
2 2022-01-01 00:00:00 3.2 2022 3.2

CodePudding user response:

Based on your output, Year_Sum_Volume is the same value for every row and can be calculated using df['Volume'].sum().

Then you join a column of a scaled list:

df.join(pd.DataFrame( {'Year_Sum_Volume': [your_sum_val] * len(df['Volume'])} ))

CodePudding user response:

Try below code (after converting date column to pd.to_datetime)

df.assign(Year_Sum_Volume = df.groupby(df['date'].dt.year)['a'].transform('sum'))
  • Related