Home > Blockchain >  Finding % increase for each month on average in a multiple year dataset
Finding % increase for each month on average in a multiple year dataset

Time:09-28

I have a dataset from yahoo finance that I'm trying to group by month (which I've done) and then plot the average % change for that month on a yearly cycle (data goes back 21 years). For example, I'd have 21 Januaries with average % increase/decrease for the sum of Januaries, 21 Februaries with average % increase/decrease for the sum of Febriaries, etc.

After around 2 hours of playing with things, I cannot get it clustered properly and it's driving me bonkers. I have tried using pd.Grouper and pd.groupby and variations of playing with datetime to isolate the month, or give the month an assigned number to group by, but I've had no luck.

My intended outcome that I'm looking for should look like (~ is placeholder for datetime YYYYMMDD format)

     Adj Close (%)  Month (int)
Date          8.5            1
~            -3.2            2
~            7.18            3
~           -1.8%            4
~            ...            ...

Using pd.to_clipboard to create this data (should be able to use pd.read_clipboard to read)

Sample of monthly data: (Date is index column, so this is functionally time series data)

           Adj Close
Date    
1999-09-30  NaN
1999-10-31  6.253954
1999-11-30  1.906186
1999-12-31  5.784389
2000-01-31  -5.090355
2000-02-29  -2.010808
2000-03-31  9.671983
2000-04-30  -3.079576
2000-05-31  -2.191505
2000-06-30  2.393355
2000-07-31  -1.634128
2000-08-31  6.069910
2000-09-30  -5.348297
2000-10-31  -0.494949
2000-11-30  -8.006861
2000-12-31  0.405345
2001-01-31  3.463658
2001-02-28  -9.229074
2001-03-31  -6.420471
2001-04-30  7.681436

CodePudding user response:

Are you looking for that:

dti = pd.date_range('1999', '2020', freq='M', name='Date')
df1 = pd.DataFrame({'Adj Close': np.random.randint(1, 30, len(dti))}, index=dti)

out = df1.groupby(df1.index.month)['Adj Close'] \
         .apply(lambda x: x.pct_change().mean()) \
         .rename_axis('Month').reset_index()

Output:

>>> out
    Month  Adj Close
0       1   1.185047
1       2   2.057344
2       3   1.116738
3       4   1.531147
4       5   0.981474
5       6   2.025258
6       7   0.914078
7       8   0.301812
8       9   0.710584
9      10   1.498942
10     11   1.910080
11     12   1.643428
  • Related