I have some data where I would like to subset between different times of day for different dates, specifically I want the column df['event']
to have a 1
if the time is between 9am to 11am, a 2
if it is between 3pm to 4pm, and 0
otherwise. As such, I tried to use the following line:
df['event'] = np.where( (df['datetime'] >= datetime.strptime(f'{date} 09:00:00', '%Y-%m-%d %H:%M:%S')) & (df['datetime'] < datetime.strptime(f'{date} 11:00:00', '%Y-%m-%d %H:%M:%S')) , 1, np.where((df['datetime'] >= datetime.strptime(f'{date} 15:00:00', '%Y-%m-%d %H:%M:%S')) & (df['timestamp'] < datetime.strptime(f'{date} 16:00:00', '%Y-%m-%d %H:%M:%S')) , 2, 0))
This however, doesn't seem to work. I figured that there's a way to do it by indexing on times, but this seems to be quite tricky since this line of code is inside a function, and when I index on times, it seems to chuck up a bunch of errors.
Is there a better way of doing this?
CodePudding user response:
There are 3 values to choose from, so use np.select
instead
import datetime
time1 = pd.to_datetime(df['datetime']).dt.time
# select 1 if time is between 9 and 11
# 2 if it's between 3pm and 4pm
# and 0 otherwise
condlist = [time1.between(datetime.time(9,0,0), datetime.time(11,0,0)),
time1.between(datetime.time(15,0,0), datetime.time(16,0,0))]
# select 1,2 or 0 depending on which condition is satisfied
df['event'] = np.select(condlist, [1, 2], default=0)
CodePudding user response:
If need working only with times here is solution with DatetimeIndex.indexer_between_time
:
df = pd.DataFrame({'datetime':pd.date_range('2020-01-01', freq='58min', periods=20),
'a':1})
df = df.set_index('datetime')
id1 = df.index.indexer_between_time('09:00:00','11:00:00')
id2 = df.index.indexer_between_time('15:00:00','16:00:00')
arr = np.zeros(len(df), dtype=int)
arr[id1] = 1
arr[id2] = 2
df['event1'] = arr
Or with:
df = df.set_index('datetime')
id1 = df.index.indexer_between_time('09:00:00','11:00:00')
id2 = df.index.indexer_between_time('15:00:00','16:00:00')
arr = np.arange(len(df))
df['event1'] = np.select([np.in1d(arr, id1), np.in1d(arr, id2)], [1, 2], default=0)
print (df)
a event1
datetime
2020-01-01 00:00:00 1 0
2020-01-01 00:58:00 1 0
2020-01-01 01:56:00 1 0
2020-01-01 02:54:00 1 0
2020-01-01 03:52:00 1 0
2020-01-01 04:50:00 1 0
2020-01-01 05:48:00 1 0
2020-01-01 06:46:00 1 0
2020-01-01 07:44:00 1 0
2020-01-01 08:42:00 1 0
2020-01-01 09:40:00 1 1
2020-01-01 10:38:00 1 1
2020-01-01 11:36:00 1 0
2020-01-01 12:34:00 1 0
2020-01-01 13:32:00 1 0
2020-01-01 14:30:00 1 0
2020-01-01 15:28:00 1 2
2020-01-01 16:26:00 1 0
2020-01-01 17:24:00 1 0
2020-01-01 18:22:00 1 0
If dont need DatetimeIndex
finally assign it to helper df1
DataFrame:
df1 = df.set_index('datetime')
id1 = df1.index.indexer_between_time('09:00:00','11:00:00')
id2 = df1.index.indexer_between_time('15:00:00','16:00:00')
arr = np.arange(len(df))
df['event1'] = np.select([np.in1d(arr, id1), np.in1d(arr, id2)], [1, 2], default=0)
print (df)
datetime a event1
0 2020-01-01 00:00:00 1 0
1 2020-01-01 00:58:00 1 0
2 2020-01-01 01:56:00 1 0
3 2020-01-01 02:54:00 1 0
4 2020-01-01 03:52:00 1 0
5 2020-01-01 04:50:00 1 0
6 2020-01-01 05:48:00 1 0
7 2020-01-01 06:46:00 1 0
8 2020-01-01 07:44:00 1 0
9 2020-01-01 08:42:00 1 0
10 2020-01-01 09:40:00 1 1
11 2020-01-01 10:38:00 1 1
12 2020-01-01 11:36:00 1 0
13 2020-01-01 12:34:00 1 0
14 2020-01-01 13:32:00 1 0
15 2020-01-01 14:30:00 1 0
16 2020-01-01 15:28:00 1 2
17 2020-01-01 16:26:00 1 0
18 2020-01-01 17:24:00 1 0
19 2020-01-01 18:22:00 1 0