Home > front end >  Resample a time-series data at the end of the month and at the end of the day
Resample a time-series data at the end of the month and at the end of the day

Time:12-19

I have a timeseries data with the following format.

DateShort (%d/%m/%Y) TimeFrom TimeTo Value
1/1/2018 0:00 1:00 6414
1/1/2018 1:00 2:00 6153
... ... ... ...
1/1/2018 23:00 0:00 6317
2/1/2018 0:00 1:00 6046
... ... ... ...

I would like to re-sample data at the end of the month and at the end of the day.

The dataset could be retrieved from https://pastebin.com/raw/NWdigN97

pandas.DataFrame.resample() provides 'M' rule to retrieve data from the end of the month but at the beginning of the day.
See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

Do you have better solution to accomplish this?

I have the following sample code:

import numpy as np
import pandas as pd

ds_url = 'https://pastebin.com/raw/NWdigN97'

df = pd.read_csv(ds_url, header=0)

df['DateTime'] = pd.to_datetime(
    df['DateShort']   ' '   df['TimeFrom'],
    format='%d/%m/%Y %H:%M'
)

df.drop('DateShort', axis=1, inplace=True)
df.set_index('DateTime', inplace=True)

df.resample('M').asfreq()

The output is

           TimeFrom TimeTo  Value
DateTime                         
2018-01-31     0:00   1:00   7215
2018-02-28     0:00   1:00   8580
2018-03-31     0:00   1:00   6202
2018-04-30     0:00   1:00   5369
2018-05-31     0:00   1:00   5840
2018-06-30     0:00   1:00   5730
2018-07-31     0:00   1:00   5979
2018-08-31     0:00   1:00   6009
2018-09-30     0:00   1:00   5430
2018-10-31     0:00   1:00   6587
2018-11-30     0:00   1:00   7948
2018-12-31     0:00   1:00   6193

However, the correct output should be

           TimeFrom TimeTo  Value
DateTime                            
2018-01-31  23:00   0:00    7605
2018-02-28  23:00   0:00    8790
2018-03-31  23:00   0:00    5967
2018-04-30  23:00   0:00    5595
2018-05-31  23:00   0:00    5558
2018-06-30  23:00   0:00    5153
2018-07-31  23:00   0:00    5996
2018-08-31  23:00   0:00    5757
2018-09-30  23:00   0:00    5785
2018-10-31  23:00   0:00    6437
2018-11-30  23:00   0:00    7830
2018-12-31  23:00   0:00    6767

CodePudding user response:

Try this:

df.groupby(pd.Grouper(freq='M')).last()

Output:

           TimeFrom TimeTo  Value
DateTime                         
2018-01-31    23:00   0:00   7605
2018-02-28    23:00   0:00   8790
2018-03-31    23:00   0:00   5967
2018-04-30    23:00   0:00   5595
2018-05-31    23:00   0:00   5558
2018-06-30    23:00   0:00   5153
2018-07-31    23:00   0:00   5996
2018-08-31    23:00   0:00   5757
2018-09-30    23:00   0:00   5785
2018-10-31    23:00   0:00   6437
2018-11-30    23:00   0:00   7830
2018-12-31    23:00   0:00   6707
  • Related