Suppose we have a dataframe including time indices and we want to extract only a dataframe including 10:23 to 14:34. How can we do this?
n =1000
i = pd.date_range('2018-04-09', periods=n, freq='1min')
ts = pd.DataFrame({'A': [i for i in range(n)]}, index=i)
print(ts)
A
2018-04-09 00:00:00 0
2018-04-09 00:01:00 1
2018-04-09 00:02:00 2
2018-04-09 00:03:00 3
2018-04-09 00:04:00 4
... ...
2018-04-09 16:35:00 995
2018-04-09 16:36:00 996
2018-04-09 16:37:00 997
2018-04-09 16:38:00 998
2018-04-09 16:39:00 999
My try:
I think for every problem like this, we need to break it into 3 conditions. Correct me if I am wrong.
mask1 = ( 10 == ts.index.hour & 23 <= ts.index.minute)
mask2 = ( 10 <= ts.index.hour )
mask3 = ( 14 == ts.index.hour & 34 >= ts.index.minute)
mask = mask1 | mask2 | mask3
ts_desire = ts[mask]
Then I get TypeError: Input must be Index or array-like
.
CodePudding user response:
Update
Why it starts from 10? It is supposed to start from 10:23 inclusive and ends at 16:34 inclusive
Maybe your are looking for between_time
:
>>> ts.between_time('10:23', '16:34')
A
2018-04-09 10:23:00 623
2018-04-09 10:24:00 624
2018-04-09 10:25:00 625
2018-04-09 10:26:00 626
2018-04-09 10:27:00 627
... ...
2018-04-09 16:30:00 990
2018-04-09 16:31:00 991
2018-04-09 16:32:00 992
2018-04-09 16:33:00 993
2018-04-09 16:34:00 994
[372 rows x 1 columns]
Missing ( )
. Take care of operator priority: &
take precedence over ==
.
# HERE ----v---v
mask1 = (10 == ts.index.hour) & (23 <= ts.index.minute)
mask2 = (10 <= ts.index.hour)
mask3 = (14 == ts.index.hour) & (34 >= ts.index.minute)
# HERE ----^---^
mask = mask1 | mask2 | mask3
ts_desire = ts[mask]
Output:
>>> ts_desire
A
2018-04-09 10:00:00 600
2018-04-09 10:01:00 601
2018-04-09 10:02:00 602
2018-04-09 10:03:00 603
2018-04-09 10:04:00 604
... ...
2018-04-09 16:35:00 995
2018-04-09 16:36:00 996
2018-04-09 16:37:00 997
2018-04-09 16:38:00 998
2018-04-09 16:39:00 999
[400 rows x 1 columns]