Home > OS >  Splitting Dataframe time into morning and evening
Splitting Dataframe time into morning and evening

Time:09-26

I have a df that looks like this (shortened):

             DateTime      Value        Date   Time   
0  2022-09-18 06:00:00       5.4  18/09/2022  06:00  
1  2022-09-18 07:00:00       6.0  18/09/2022  07:00  
2  2022-09-18 08:00:00       6.5  18/09/2022  08:00  
3  2022-09-18 09:00:00       6.7  18/09/2022  09:00  
8  2022-09-18 14:00:00       7.9  18/09/2022  14:00  
9  2022-09-18 15:00:00       7.8  18/09/2022  15:00  
10 2022-09-18 16:00:00       7.6  18/09/2022  16:00  
11 2022-09-18 17:00:00       6.8  18/09/2022  17:00  
12 2022-09-18 18:00:00       6.4  18/09/2022  18:00   
13 2022-09-18 19:00:00       5.7  18/09/2022  19:00   
14 2022-09-18 20:00:00       4.8  18/09/2022  20:00   
15 2022-09-18 21:00:00       5.4  18/09/2022  21:00   
16 2022-09-18 22:00:00       4.7  18/09/2022  22:00   
17 2022-09-18 23:00:00       4.3  18/09/2022  23:00   
18 2022-09-19 00:00:00       4.1  19/09/2022  00:00    
19 2022-09-19 01:00:00       4.4  19/09/2022  01:00    
22 2022-09-19 04:00:00       3.5  19/09/2022  04:00    
23 2022-09-19 05:00:00       2.8  19/09/2022  05:00    
24 2022-09-19 06:00:00       3.8  19/09/2022  06:00  

I want to create a new column where i split the between day and night like this:

00:00 - 05:00 night , 06:00 - 18:00 day , 19:00 - 23:00 night

But apparently one can't use same label? How can I solve this problem? Here is my code

    
df['period'] =  pd.cut(pd.to_datetime(df.DateTime).dt.hour,
       bins=[0, 5, 17, 23],
       labels=['night', 'morning', 'night'],
       include_lowest=True)

It's returning

ValueError: labels must be unique if ordered=True; pass ordered=False for duplicate labels

CodePudding user response:

if i understood correctly, if time is between 00:00 - 05:00 or 19:00 - 23:00, you want your new column to say 'night', else 'day', well here's that code:

df['day/night'] = df['Time'].apply(lambda x: 'night' if '00:00' <= x <= '05:00' or '19:00' <= x <= '23:00' else 'day')

or you can add ordered = false parameter using your method

input ->

df = pd.DataFrame(columns=['DateTime', 'Value', 'Date', 'Time'], data=[
    ['2022-09-18 06:00:00', 5.4, '18/09/2022', '06:00'],
    ['2022-09-18 07:00:00', 6.0, '18/09/2022', '07:00'],
    ['2022-09-18 08:00:00', 6.5, '18/09/2022', '08:00'],
    ['2022-09-18 09:00:00', 6.7, '18/09/2022', '09:00'],
    ['2022-09-18 14:00:00', 7.9, '18/09/2022', '14:00'],
    ['2022-09-18 15:00:00', 7.8, '18/09/2022', '15:00'],
    ['2022-09-18 16:00:00', 7.6, '18/09/2022', '16:00'],
    ['2022-09-18 17:00:00', 6.8, '18/09/2022', '17:00'],
    ['2022-09-18 18:00:00', 6.4, '18/09/2022', '18:00'],
    ['2022-09-18 19:00:00', 5.7, '18/09/2022', '19:00'],
    ['2022-09-18 20:00:00', 4.8, '18/09/2022', '20:00'],
    ['2022-09-18 21:00:00', 5.4, '18/09/2022', '21:00'],
    ['2022-09-18 22:00:00', 4.7, '18/09/2022', '22:00'],
    ['2022-09-18 23:00:00', 4.3, '18/09/2022', '23:00'],
    ['2022-09-19 00:00:00', 4.1, '19/09/2022', '00:00'],
    ['2022-09-19 01:00:00', 4.4, '19/09/2022', '01:00'],
    ['2022-09-19 04:00:00', 3.5, '19/09/2022', '04:00'],
    ['2022-09-19 05:00:00', 2.8, '19/09/2022', '05:00'],
    ['2022-09-19 06:00:00', 3.8, '19/09/2022', '06:00']])

output ->

               DateTime  Value        Date   Time is_0600_0900
0   2022-09-18 06:00:00    5.4  18/09/2022  06:00          day
1   2022-09-18 07:00:00    6.0  18/09/2022  07:00          day
2   2022-09-18 08:00:00    6.5  18/09/2022  08:00          day
3   2022-09-18 09:00:00    6.7  18/09/2022  09:00          day
4   2022-09-18 14:00:00    7.9  18/09/2022  14:00          day
5   2022-09-18 15:00:00    7.8  18/09/2022  15:00          day
6   2022-09-18 16:00:00    7.6  18/09/2022  16:00          day
7   2022-09-18 17:00:00    6.8  18/09/2022  17:00          day
8   2022-09-18 18:00:00    6.4  18/09/2022  18:00          day
9   2022-09-18 19:00:00    5.7  18/09/2022  19:00        night
10  2022-09-18 20:00:00    4.8  18/09/2022  20:00        night
11  2022-09-18 21:00:00    5.4  18/09/2022  21:00        night
12  2022-09-18 22:00:00    4.7  18/09/2022  22:00        night
13  2022-09-18 23:00:00    4.3  18/09/2022  23:00        night
14  2022-09-19 00:00:00    4.1  19/09/2022  00:00        night
15  2022-09-19 01:00:00    4.4  19/09/2022  01:00        night
16  2022-09-19 04:00:00    3.5  19/09/2022  04:00        night
17  2022-09-19 05:00:00    2.8  19/09/2022  05:00        night
18  2022-09-19 06:00:00    3.8  19/09/2022  06:00          day

CodePudding user response:

You have two options.

Either you don't care about the order and you can set ordered=False as parameter of cut:

df['period'] =  pd.cut(pd.to_datetime(df.DateTime).dt.hour,
       bins=[0, 5, 17, 23],
       labels=['night', 'morning', 'night'],
       ordered=False,
       include_lowest=True)

Or you care to have night and morning ordered, in which case you can further convert to ordered Categorical:

df['period'] = pd.Categorical(df['period'], categories=['night', 'morning'], ordered=True)

output:

               DateTime  Value        Date   Time   period
0   2022-09-18 06:00:00    5.4  18/09/2022  06:00  morning
1   2022-09-18 07:00:00    6.0  18/09/2022  07:00  morning
2   2022-09-18 08:00:00    6.5  18/09/2022  08:00  morning
3   2022-09-18 09:00:00    6.7  18/09/2022  09:00  morning
8   2022-09-18 14:00:00    7.9  18/09/2022  14:00  morning
9   2022-09-18 15:00:00    7.8  18/09/2022  15:00  morning
10  2022-09-18 16:00:00    7.6  18/09/2022  16:00  morning
11  2022-09-18 17:00:00    6.8  18/09/2022  17:00  morning
12  2022-09-18 18:00:00    6.4  18/09/2022  18:00    night
13  2022-09-18 19:00:00    5.7  18/09/2022  19:00    night
14  2022-09-18 20:00:00    4.8  18/09/2022  20:00    night
15  2022-09-18 21:00:00    5.4  18/09/2022  21:00    night
16  2022-09-18 22:00:00    4.7  18/09/2022  22:00    night
17  2022-09-18 23:00:00    4.3  18/09/2022  23:00    night
18  2022-09-19 00:00:00    4.1  19/09/2022  00:00    night
19  2022-09-19 01:00:00    4.4  19/09/2022  01:00    night
22  2022-09-19 04:00:00    3.5  19/09/2022  04:00    night
23  2022-09-19 05:00:00    2.8  19/09/2022  05:00    night
24  2022-09-19 06:00:00    3.8  19/09/2022  06:00  morning

column:

df['period']

0     morning
1     morning
2     morning
...
23      night
24    morning
Name: period, dtype: category
Categories (2, object): ['morning', 'night']
  • Related