Python pandas- datetime for Peak hour and peak off Hour analysis-CodePudding

So I have a df like this:

import pandas as pd
import numpy as np

datatime = [('2019-09-15 00:15:00.000000000'),
            ('2019-09-15 00:30:00.000000000'),
            ('2019-09-15 00:45:00.000000000'),
            ('2019-09-15 01:00:00.000000000'),
            ('2019-09-15 01:15:00.000000000'),
            ('2019-09-15 01:30:00.000000000'),
            ('2019-09-15 01:45:00.000000000'),
            ('2019-09-15 02:00:00.000000000'),
            ('2019-09-15 02:15:00.000000000')]
p =[494.76,486.36,484.68,500.64,482.16,483.84,483.0,478.8,493.08,474.6]
q = [47.88,33.6,41.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0]

df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
df

I am doing an analysis for 30 days by grouping my data into on-peak and off-peak hours. For this I also need to identify the days of the week. I tried use the pandas function:

df.dt.day_name()

But in this particular case it is not feasible since for this function the day start at 00:00:00 and at my program I need it to start at 00:15:00. Since I have 96 points for each day, I thought about using a dictionary:

days_of_the_week = {'Sunday': 1,'Monday': 2,'Tuesday': 3, 'Wednesday': 4, 'Thursday':5, 'Friday':6 , 'Saturday':7}

How can I apply it to my df so that every 96 points a new day is identified?

CodePudding user response：

You can just add an offset using a Timedelta object when calculating the weekday. This won't affect the values of the datetime column.

In [21]: dt_index = pd.date_range(start='2022-01-01', end='2022-01-01 23:45:00', periods=96)

In [23]: df = pd.DataFrame(zip(dt_index, np.random.rand(len(dt_index))), columns=['datetime', 'whatever'])

In [24]: df.tail()
Out[24]:
              datetime  whatever
91 2022-01-01 22:45:00  0.910446
92 2022-01-01 23:00:00  0.199106
93 2022-01-01 23:15:00  0.051808
94 2022-01-01 23:30:00  0.799284
95 2022-01-01 23:45:00  0.584663

In [25]: df['weekday'] = (df.datetime.astype('datetime64[ns]')   pd.Timedelta(seconds=15*60)).dt.day_name()

In [26]: df.tail()
Out[26]:
              datetime  whatever   weekday
91 2022-01-01 22:45:00  0.910446  Saturday
92 2022-01-01 23:00:00  0.199106  Saturday
93 2022-01-01 23:15:00  0.051808  Saturday
94 2022-01-01 23:30:00  0.799284  Saturday
95 2022-01-01 23:45:00  0.584663    Sunday

Just a note on the way you constructed you DataFrame.

df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])

The use of list is unnecessary and could impede performance for larger data sets. Additionally, you shouldn't use the the nested list for the columns argument as it has unintended effects.

In [27]: df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])

In [28]: type(df.datetime)
Out[28]: pandas.core.frame.DataFrame

In [29]: df = pd.DataFrame(zip(datatime, p, q), columns=['datetime','p','q'])

In [30]: type(df.datetime)
Out[30]: pandas.core.series.Series