So I have a df like this:
import pandas as pd
import numpy as np
datatime = [('2019-09-15 00:15:00.000000000'),
('2019-09-15 00:30:00.000000000'),
('2019-09-15 00:45:00.000000000'),
('2019-09-15 01:00:00.000000000'),
('2019-09-15 01:15:00.000000000'),
('2019-09-15 01:30:00.000000000'),
('2019-09-15 01:45:00.000000000'),
('2019-09-15 02:00:00.000000000'),
('2019-09-15 02:15:00.000000000')]
p =[494.76,486.36,484.68,500.64,482.16,483.84,483.0,478.8,493.08,474.6]
q = [47.88,33.6,41.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
df
I am doing an analysis for 30 days by grouping my data into on-peak and off-peak hours. For this I also need to identify the days of the week. I tried use the pandas
function:
df.dt.day_name()
But in this particular case it is not feasible since for this function the day start at 00:00:00
and at my program I need it to start at 00:15:00
.
Since I have 96 points for each day, I thought about using a dictionary:
days_of_the_week = {'Sunday': 1,'Monday': 2,'Tuesday': 3, 'Wednesday': 4, 'Thursday':5, 'Friday':6 , 'Saturday':7}
How can I apply it to my df so that every 96 points a new day is identified?
CodePudding user response:
You can just add an offset using a Timedelta
object when calculating the weekday. This won't affect the values of the datetime
column.
In [21]: dt_index = pd.date_range(start='2022-01-01', end='2022-01-01 23:45:00', periods=96)
In [23]: df = pd.DataFrame(zip(dt_index, np.random.rand(len(dt_index))), columns=['datetime', 'whatever'])
In [24]: df.tail()
Out[24]:
datetime whatever
91 2022-01-01 22:45:00 0.910446
92 2022-01-01 23:00:00 0.199106
93 2022-01-01 23:15:00 0.051808
94 2022-01-01 23:30:00 0.799284
95 2022-01-01 23:45:00 0.584663
In [25]: df['weekday'] = (df.datetime.astype('datetime64[ns]') pd.Timedelta(seconds=15*60)).dt.day_name()
In [26]: df.tail()
Out[26]:
datetime whatever weekday
91 2022-01-01 22:45:00 0.910446 Saturday
92 2022-01-01 23:00:00 0.199106 Saturday
93 2022-01-01 23:15:00 0.051808 Saturday
94 2022-01-01 23:30:00 0.799284 Saturday
95 2022-01-01 23:45:00 0.584663 Sunday
Just a note on the way you constructed you DataFrame
.
df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
The use of list
is unnecessary and could impede performance for larger data sets. Additionally, you shouldn't use the the nested list for the columns
argument as it has unintended effects.
In [27]: df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
In [28]: type(df.datetime)
Out[28]: pandas.core.frame.DataFrame
In [29]: df = pd.DataFrame(zip(datatime, p, q), columns=['datetime','p','q'])
In [30]: type(df.datetime)
Out[30]: pandas.core.series.Series