need help assigning custom timeslots to datetime data-CodePudding

I have a datetime data to a minute (sample below)

2021-11-08 00:10:00
2021-11-08 01:10:00
2021-11-08 02:25:00
2021-11-08 03:55:00
2021-11-08 06:55:00
2021-11-08 12:35:00
2021-11-08 16:05:00
2021-11-08 17:10:00
2021-11-08 18:45:00
2021-11-08 19:10:00
2021-11-08 20:25:00
2021-11-08 20:55:00
2021-11-08 22:55:00

and I need to assign a custom time slot below to that data set. some slots start at a full hour (at 9:00) some in the middle (at 12:30)

'0000-0259'
'0300-0859'
'0900-1229'
'1230-1659'
'1700-1929'
'1930-2029'
'2030-2359'

I have been trying to do that via dict. with each hour having a time slot but 1230 time slots are tricky.

try 2 was with between_time but it requires DateTimeIndex - does not work here

def time_slot(ref):
    if ref.between_time('00:00','02:59'):
        return '0000-0259'
    elif ref.between_time('03:00','08:59'):
        return '0300-0859'
    elif ref.between_time('09:00','12:29'):
        return '0900-1229'
    elif ref.between_time('12:30','16:59'):
        return '1230-1659'
    elif ref.between_time('17:00','19:29'):
        return '1700-1929'
    elif ref.between_time('19:30','20:29'):
        return '1930-2029'
    else:
        return '2030-2359'

try 3 was set up nested if with < below selected time lost

format = '%H:%M'

def time_slot(ref):
    if ref < dt.strptime('03:00', format):
        return '0000-0259'
    elif ref < dt.strptime('09:00', format):
        return '0300-0859'
    elif ref < dt.strptime('12:30', format):
        return '0900-1229'
    elif ref < dt.strptime('17:00', format):
        return '1700-1929'
    elif ref < dt.strptime('19:30', format):
        return '1930-2029'
    else:
        return '2030-2359'

but I haven't compared datetime.time with datetime.datetime.

CodePudding user response：

Given that the initial time data is in a string format, this is how I would proceed: Given a dataframe of form:

    Time
0   2021-11-08 00:10:00
1   2021-11-08 01:10:00
2   2021-11-08 02:25:00
3   2021-11-08 03:55:00
4   2021-11-08 06:55:00
5   2021-11-08 12:35:00

Step 1. Add a timestamp column

df['TimeStamp'] = df.apply(lambda row: du.parser.parse(row.Time), axis = 1)

Producing:

    Time    TimeStamp
0   2021-11-08 00:10:00     2021-11-08 00:10:00
1   2021-11-08 01:10:00     2021-11-08 01:10:00
2   2021-11-08 02:25:00     2021-11-08 02:25:00
3   2021-11-08 03:55:00     2021-11-08 03:55:00
4   2021-11-08 06:55:00     2021-11-08 06:55:00
5   2021-11-08 12:35:00     2021-11-08 12:35:00

Step # 2, create a function which will return a time slot label for each timestamp as follows:

def getLabel(tval):
    """ Return the label associated with the timestamp """
    labels = ['0000-0259', '0300-0859', '0900-1229', '1230-1659', '1700-1929', '1930-2029', '2030-2359' ]
    slot_start = [(0, 0), (3, 0), (9, 0), (12, 30), (17, 0), (19,30), (20, 30)]
    for lidx, tme in enumerate(slot_start):
        if tme[0] > tval.hour:
            return labels[lidx-1]
        elif tval.hour == tme[0] and tme[1] <= tval.minute:
            return labels[lidx]
    return labels[-1]

Step 3 Apply the getLabel function to create a Time_Ref column as follows:

df['Time_Ref'] = df.apply(lambda row: getLabel(row.TimeStamp), axis=1)

Which yields:

    Time    TimeStamp   Time_Ref
0   2021-11-08 00:10:00     2021-11-08 00:10:00     0000-0259
1   2021-11-08 01:10:00     2021-11-08 01:10:00     0000-0259
2   2021-11-08 02:25:00     2021-11-08 02:25:00     0000-0259
3   2021-11-08 03:55:00     2021-11-08 03:55:00     0300-0859
4   2021-11-08 06:55:00     2021-11-08 06:55:00     0300-0859
5   2021-11-08 12:35:00     2021-11-08 12:35:00     1230-1659
6   2021-11-08 16:05:00     2021-11-08 16:05:00     1230-1659
7   2021-11-08 17:10:00     2021-11-08 17:10:00     1700-1929
8   2021-11-08 18:45:00     2021-11-08 18:45:00     1700-1929
9   2021-11-08 19:10:00     2021-11-08 19:10:00     1930-2029
10  2021-11-08 20:25:00     2021-11-08 20:25:00     2030-2359
11  2021-11-08 20:55:00     2021-11-08 20:55:00     2030-2359

You can also combine steps 2 & 3 which eliminates adding the timestamp column with the following:

df['Time_Ref'] = df.apply(lambda row: getLabel(du.parser.parse(row.Time)), axis=1)