I have a datetime data to a minute (sample below)
2021-11-08 00:10:00
2021-11-08 01:10:00
2021-11-08 02:25:00
2021-11-08 03:55:00
2021-11-08 06:55:00
2021-11-08 12:35:00
2021-11-08 16:05:00
2021-11-08 17:10:00
2021-11-08 18:45:00
2021-11-08 19:10:00
2021-11-08 20:25:00
2021-11-08 20:55:00
2021-11-08 22:55:00
and I need to assign a custom time slot below to that data set. some slots start at a full hour (at 9:00) some in the middle (at 12:30)
'0000-0259'
'0300-0859'
'0900-1229'
'1230-1659'
'1700-1929'
'1930-2029'
'2030-2359'
I have been trying to do that via dict. with each hour having a time slot but 1230 time slots are tricky.
try 2 was with between_time
but it requires DateTimeIndex - does not work here
def time_slot(ref):
if ref.between_time('00:00','02:59'):
return '0000-0259'
elif ref.between_time('03:00','08:59'):
return '0300-0859'
elif ref.between_time('09:00','12:29'):
return '0900-1229'
elif ref.between_time('12:30','16:59'):
return '1230-1659'
elif ref.between_time('17:00','19:29'):
return '1700-1929'
elif ref.between_time('19:30','20:29'):
return '1930-2029'
else:
return '2030-2359'
try 3 was set up nested if with < below selected time lost
format = '%H:%M'
def time_slot(ref):
if ref < dt.strptime('03:00', format):
return '0000-0259'
elif ref < dt.strptime('09:00', format):
return '0300-0859'
elif ref < dt.strptime('12:30', format):
return '0900-1229'
elif ref < dt.strptime('17:00', format):
return '1700-1929'
elif ref < dt.strptime('19:30', format):
return '1930-2029'
else:
return '2030-2359'
but I haven't compared datetime.time
with datetime.datetime
.
CodePudding user response:
Given that the initial time data is in a string format, this is how I would proceed: Given a dataframe of form:
Time
0 2021-11-08 00:10:00
1 2021-11-08 01:10:00
2 2021-11-08 02:25:00
3 2021-11-08 03:55:00
4 2021-11-08 06:55:00
5 2021-11-08 12:35:00
Step 1. Add a timestamp column
df['TimeStamp'] = df.apply(lambda row: du.parser.parse(row.Time), axis = 1)
Producing:
Time TimeStamp
0 2021-11-08 00:10:00 2021-11-08 00:10:00
1 2021-11-08 01:10:00 2021-11-08 01:10:00
2 2021-11-08 02:25:00 2021-11-08 02:25:00
3 2021-11-08 03:55:00 2021-11-08 03:55:00
4 2021-11-08 06:55:00 2021-11-08 06:55:00
5 2021-11-08 12:35:00 2021-11-08 12:35:00
Step # 2, create a function which will return a time slot label for each timestamp as follows:
def getLabel(tval):
""" Return the label associated with the timestamp """
labels = ['0000-0259', '0300-0859', '0900-1229', '1230-1659', '1700-1929', '1930-2029', '2030-2359' ]
slot_start = [(0, 0), (3, 0), (9, 0), (12, 30), (17, 0), (19,30), (20, 30)]
for lidx, tme in enumerate(slot_start):
if tme[0] > tval.hour:
return labels[lidx-1]
elif tval.hour == tme[0] and tme[1] <= tval.minute:
return labels[lidx]
return labels[-1]
Step 3 Apply the getLabel function to create a Time_Ref column as follows:
df['Time_Ref'] = df.apply(lambda row: getLabel(row.TimeStamp), axis=1)
Which yields:
Time TimeStamp Time_Ref
0 2021-11-08 00:10:00 2021-11-08 00:10:00 0000-0259
1 2021-11-08 01:10:00 2021-11-08 01:10:00 0000-0259
2 2021-11-08 02:25:00 2021-11-08 02:25:00 0000-0259
3 2021-11-08 03:55:00 2021-11-08 03:55:00 0300-0859
4 2021-11-08 06:55:00 2021-11-08 06:55:00 0300-0859
5 2021-11-08 12:35:00 2021-11-08 12:35:00 1230-1659
6 2021-11-08 16:05:00 2021-11-08 16:05:00 1230-1659
7 2021-11-08 17:10:00 2021-11-08 17:10:00 1700-1929
8 2021-11-08 18:45:00 2021-11-08 18:45:00 1700-1929
9 2021-11-08 19:10:00 2021-11-08 19:10:00 1930-2029
10 2021-11-08 20:25:00 2021-11-08 20:25:00 2030-2359
11 2021-11-08 20:55:00 2021-11-08 20:55:00 2030-2359
You can also combine steps 2 & 3 which eliminates adding the timestamp column with the following:
df['Time_Ref'] = df.apply(lambda row: getLabel(du.parser.parse(row.Time)), axis=1)