Home > front end >  DataFrame: Rolling Year-to-Date cumulative STD (Time Series)
DataFrame: Rolling Year-to-Date cumulative STD (Time Series)

Time:01-30

I have a DataFrame that looks something like this:

            daily_return      year month day
date                                        
2018-12-27           NaN      2018    12  27
2018-12-28      1.020245      2018    12  28
2018-12-31      1.000650      2018    12  31
2019-01-02      1.020473      2019    01  02
2019-01-03      1.009129      2019    01  03
...                  ...       ...   ...  ..
2023-01-20      1.001087      2023    01  20
2023-01-23      0.998455      2023    01  23
2023-01-24      1.002491      2023    01  24
2023-01-25      1.004482      2023    01  25
2023-01-26      1.004953      2023    01  26

I want to calculate, for each date (each row), the year-to-Date Standard Deviation of the daily_return column. Examples:

  • For the row '2018-12-31', the result would be the standard deviation of daily_return that goes from '2018-12-01' up until '2018-12-31': [NaN, 1.020245, 1.000650].
  • For the row '2019-01-03', we are referring to the year '2019'. Therefore, the standard deviation should only consider values from '2019-01-01' up until the current row's date '2019-01-03': [1.020473, 1.009129]
  • For the row [2023-01-23], the result should be all daily_returns from 2023-01-01 up until 2023-01-23
  • For the row [2023-01-24], the result should be all daily_returns from 2023-01-01 up until 2023-01-24 ...

I managed to make something similar work for Month-to-Date cumprod by using:

df.groupby(['month', 'year'])['daily_return'].cumprod()

date
2018-12-27         NaN
2018-12-28    1.020245
2018-12-31    1.020909
2019-01-02    1.020473
2019-01-03    1.029789
                ...   
2023-01-20    0.999914
2023-01-23    0.998369
2023-01-24    1.000856
2023-01-25    1.005342
2023-01-26    1.010322

You can see it is working properly since, for every date, it gives me the Month-to-Date cumprod.

However, whenever I try grouping by year and calculating the std, it will only calculate the std for the entire year, and not for the Year-to-Date for each row:

df.groupby(['year'])['daily_return'].std()

year
2018    0.013856
2019    0.007128
2020    0.021325
2021    0.010358
2022    0.009854
2023    0.006444

CodePudding user response:

If you first sort the index, so that the rows are in the correct order, you can then then group by year, then using the expanding window and apply the .std() function:

# may need to set index as datetime if it isn't already
df.index = pd.to_datetime(df.index, format="%Y-%m-%d")

# sort the index
df.sort_index(inplace=True)
# calculate YTD standard deviation
df["YTD std"] = df.groupby(df.index.year)['daily_return'].expanding().std().droplevel(0)

df
#Out: 
#            daily_return  year  month  day   YTD std
#date                                                
#2018-12-27           NaN  2018     12   27       NaN
#2018-12-28      1.020245  2018     12   28       NaN
#2018-12-31      1.000650  2018     12   31  0.013856
#2019-01-02      1.020473  2019      1    2       NaN
#2019-01-03      1.009129  2019      1    3  0.008021
#2023-01-20      1.001087  2023      1   20       NaN
#2023-01-23      0.998455  2023      1   23  0.001861
#2023-01-24      1.002491  2023      1   24  0.002049
#2023-01-25      1.004482  2023      1   25  0.002533
#2023-01-26      1.004953  2023      1   26  0.002650
  • Related