Python Pandas Period Strings does not work on minutes-CodePudding

my df is like this:

                   timestamp       power
0        2022-01-01 00:00:00  100.000000
1        2022-01-01 00:00:01  100.004526
2        2022-01-01 00:00:02  100.009053
3        2022-01-01 00:00:03  100.013579
4        2022-01-01 00:00:04  100.018105
...                      ...         ...
31535995 2022-12-31 23:59:55  136.750000
31535996 2022-12-31 23:59:56  136.560000
31535997 2022-12-31 23:59:57  136.440000
31535998 2022-12-31 23:59:58  136.380000
31535999 2022-12-31 23:59:59  136.530000

[31536000 rows x 2 columns]

I have a super simple script:

directory = 'data/peak_shaving/20220803_132445'
df = pd.read_csv(f'{directory}/demand_profile_simulation.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.groupby(pd.PeriodIndex(df['timestamp'], freq="15min"))['power'].mean()

the result for this is:

timestamp
2022-01-01 00:00    100.133526
2022-01-01 00:01    100.405105
2022-01-01 00:02    100.676684
2022-01-01 00:03    100.948263
2022-01-01 00:04    101.219842
                       ...    
2022-12-31 23:55    153.952833
2022-12-31 23:56    150.040333
2022-12-31 23:57    146.124167
2022-12-31 23:58    142.225833
2022-12-31 23:59    138.318167
Freq: 15T, Name: power, Length: 525600, dtype: float64

as you can see it is grouped as minutes, not as 15 min intervals. When I try other freq like one day it works perfectly:

2022-01-01    120.291041
2022-01-02    126.085428
2022-01-03    120.840020
2022-01-04    124.335800
2022-01-05    119.230694
                 ...    
2022-12-27    125.802254
2022-12-28    123.833951
2022-12-29    126.609810
2022-12-30    123.971885
2022-12-31    122.798069
Freq: D, Name: power, Length: 365, dtype: float64

Also tested hours and many other freq and it works but I can not make it work for 15in intervals, is there any issue in my code? Thanks

CodePudding user response：

For me working your solution correct, here is altenative with Series.dt.to_period:

df = pd.read_csv(f'{directory}/demand_profile_simulation.csv', parse_dates=['timestamp'])
df = df.groupby(df['timestamp'].dt.to_period('15Min'))['power'].mean()

Another solutions:

df = pd.read_csv(f'{directory}/demand_profile_simulation.csv', parse_dates=['timestamp'])
df = df.groupby(pd.Grouper(key='timestamp', freq="15min"))['power'].mean()
#alternative
#df = df.resample("15min", on='timestamp')['power'].mean()

CodePudding user response：

You can go through this link https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html

I think this may help

ex:

 pd.Series(pd.date_range(
    '1/1/2020', '1/2/2020', freq='15min', closed='left')).dt.time