Home > database >  How to calculate monthly changes in a time series using pandas dataframe
How to calculate monthly changes in a time series using pandas dataframe

Time:05-14

As I am new to Python I am probably asking for something basic for most of you. However, I have a df where 'Date' is the index, another column that is returning the month related to the Date, and one Data column.

            Mnth    TSData
Date
2012-01-05     1  192.6257
2012-01-12     1  194.2714
2012-01-19     1  192.0086
2012-01-26     1  186.9729
2012-02-02     2  183.7700
2012-02-09     2  178.2343
2012-02-16     2  172.3429
2012-02-23     2  171.7800
2012-03-01     3  169.6300
2012-03-08     3  168.7386
2012-03-15     3  167.1700
2012-03-22     3  165.9543
2012-03-29     3  165.0771
2012-04-05     4  164.6371
2012-04-12     4  164.6500
2012-04-19     4  166.9171
2012-04-26     4  166.4514
2012-05-03     5  166.3657
2012-05-10     5  168.2543
2012-05-17     5  176.8271
2012-05-24     5  179.1971
2012-05-31     5  183.7120
2012-06-07     6  195.1286

I wish to calculate monthly changes in the data set that I can later use in a boxplot. So from the table above the results i seek are:

     Mnth               Chng
1    -8,9  (183,77 - 192,66)
2  -14,14  (169,63 - 183,77)
3      -5  (164,63 - 169,63)
4    1,73  (166,36 - 164,63)
5   28,77  (195,13 - 166,36)  

and so on...

any suggestions?

thanks :)

CodePudding user response:

IIUC, starting from this as df:

          Date  Mnth    TSData
0   2012-01-05     1  192.6257
1   2012-01-12     1  194.2714
2   2012-01-19     1  192.0086
3   2012-01-26     1  186.9729
4   2012-02-02     2  183.7700
...
20  2012-05-24     5  179.1971
21  2012-05-31     5  183.7120
22  2012-06-07     6  195.1286

you can use:

df.groupby('Mnth')['TSData'].first().diff().shift(-1)

# or 
# -df.groupby('Mnth')['TSData'].first().diff(-1)

NB. the data must be sorted by date to have the desired date to be used in the computation as the first item of each group (df.sort_values(by=['Mnth', 'Date']))

output:

Mnth
1    -8.8557
2   -14.1400
3    -4.9929
4     1.7286
5    28.7629
6        NaN
Name: TSData, dtype: float64

CodePudding user response:

  1. I'll verify that we have a datetime index:
df.index = pd.to_datetime(df.index)
  1. Then it's simply a matter of using resample:
df['TSData'].resample('M').first().diff().shift(freq='-1M')

Output:

Date
2011-12-31        NaN
2012-01-31    -8.8557
2012-02-29   -14.1400
2012-03-31    -4.9929
2012-04-30     1.7286
2012-05-31    28.7629
Name: TSData, dtype: float64
  • Related