Home > Net >  Subset between 4 different times in Python
Subset between 4 different times in Python

Time:06-13

I have some data where I would like to subset between different times of day for different dates, specifically I want the column df['event'] to have a 1 if the time is between 9am to 11am, a 2 if it is between 3pm to 4pm, and 0 otherwise. As such, I tried to use the following line:

df['event'] = np.where( (df['datetime'] >= datetime.strptime(f'{date} 09:00:00', '%Y-%m-%d %H:%M:%S')) & (df['datetime'] < datetime.strptime(f'{date} 11:00:00', '%Y-%m-%d %H:%M:%S')) , 1, np.where((df['datetime'] >= datetime.strptime(f'{date} 15:00:00', '%Y-%m-%d %H:%M:%S')) & (df['timestamp'] < datetime.strptime(f'{date} 16:00:00', '%Y-%m-%d %H:%M:%S')) , 2, 0))

This however, doesn't seem to work. I figured that there's a way to do it by indexing on times, but this seems to be quite tricky since this line of code is inside a function, and when I index on times, it seems to chuck up a bunch of errors.

Is there a better way of doing this?

CodePudding user response:

There are 3 values to choose from, so use np.select instead

import datetime
time1 = pd.to_datetime(df['datetime']).dt.time
# select 1 if time is between 9 and 11
# 2 if it's between 3pm and 4pm
# and 0 otherwise
condlist = [time1.between(datetime.time(9,0,0), datetime.time(11,0,0)), 
            time1.between(datetime.time(15,0,0), datetime.time(16,0,0))]
# select 1,2 or 0 depending on which condition is satisfied
df['event'] = np.select(condlist, [1, 2], default=0)

CodePudding user response:

If need working only with times here is solution with DatetimeIndex.indexer_between_time:

df = pd.DataFrame({'datetime':pd.date_range('2020-01-01', freq='58min', periods=20), 
                    'a':1})

df = df.set_index('datetime')

id1 = df.index.indexer_between_time('09:00:00','11:00:00')
id2 = df.index.indexer_between_time('15:00:00','16:00:00')

arr = np.zeros(len(df), dtype=int)
arr[id1] = 1
arr[id2] = 2
df['event1'] =  arr 

Or with:

df = df.set_index('datetime')

id1 = df.index.indexer_between_time('09:00:00','11:00:00')
id2 = df.index.indexer_between_time('15:00:00','16:00:00')

arr = np.arange(len(df))

df['event1'] = np.select([np.in1d(arr, id1), np.in1d(arr, id2)], [1, 2], default=0)

print (df)
                     a  event1
datetime                      
2020-01-01 00:00:00  1       0
2020-01-01 00:58:00  1       0
2020-01-01 01:56:00  1       0
2020-01-01 02:54:00  1       0
2020-01-01 03:52:00  1       0
2020-01-01 04:50:00  1       0
2020-01-01 05:48:00  1       0
2020-01-01 06:46:00  1       0
2020-01-01 07:44:00  1       0
2020-01-01 08:42:00  1       0
2020-01-01 09:40:00  1       1
2020-01-01 10:38:00  1       1
2020-01-01 11:36:00  1       0
2020-01-01 12:34:00  1       0
2020-01-01 13:32:00  1       0
2020-01-01 14:30:00  1       0
2020-01-01 15:28:00  1       2
2020-01-01 16:26:00  1       0
2020-01-01 17:24:00  1       0
2020-01-01 18:22:00  1       0

If dont need DatetimeIndex finally assign it to helper df1 DataFrame:

df1 = df.set_index('datetime')

id1 = df1.index.indexer_between_time('09:00:00','11:00:00')
id2 = df1.index.indexer_between_time('15:00:00','16:00:00')

arr = np.arange(len(df))

df['event1'] = np.select([np.in1d(arr, id1), np.in1d(arr, id2)], [1, 2], default=0)
print (df)
              datetime  a  event1
0  2020-01-01 00:00:00  1       0
1  2020-01-01 00:58:00  1       0
2  2020-01-01 01:56:00  1       0
3  2020-01-01 02:54:00  1       0
4  2020-01-01 03:52:00  1       0
5  2020-01-01 04:50:00  1       0
6  2020-01-01 05:48:00  1       0
7  2020-01-01 06:46:00  1       0
8  2020-01-01 07:44:00  1       0
9  2020-01-01 08:42:00  1       0
10 2020-01-01 09:40:00  1       1
11 2020-01-01 10:38:00  1       1
12 2020-01-01 11:36:00  1       0
13 2020-01-01 12:34:00  1       0
14 2020-01-01 13:32:00  1       0
15 2020-01-01 14:30:00  1       0
16 2020-01-01 15:28:00  1       2
17 2020-01-01 16:26:00  1       0
18 2020-01-01 17:24:00  1       0
19 2020-01-01 18:22:00  1       0
  • Related