Here is a small snippet of a dataFrame I am working with:
fruit time
0 apple 2021-12-20 17:55:00
1 bannana 2021-12-23 05:13:00
2 apple 2021-12-20 17:55:00
How can I go about getting data in between certain timestamps. Such as all data in between 17:00:00 and 18:00:00.
In addition if possible, I would like to get data in between certain timestamps who's fruit value equals "apple"
I have tried df.between_time but I get the error: TypeError: Index must be DatetimeIndex. Seems like the issue is with the timestamp formatting.
CodePudding user response:
df["date"], df["time"] = zip(*[(str(x.date()), str(x.time())) for x in pd.to_datetime(df["time"])])
df = (df
.query("time.between('17:00:00', '18:00:00', inclusive='both') & fruit.eq('apple')")
.assign(time=pd.to_datetime(df["date"].str.cat(df["time"], sep=" ")))
).drop(columns="date")
print(df)
Output:
fruit time
0 apple 2021-12-20 17:55:00
2 apple 2021-12-20 17:55:00
CodePudding user response:
Solution 1: boolean indexing to filter the rows where hour
is between 17
and 18
:
df[df['fruit'].eq('apple') & df['time'].dt.hour.between(17, 18)]
Solution 2: Set the index to time
column then use between_time
to filter the rows
(
df
.set_index('time')
.query("fruit == 'apple'")
.between_time('17:00:00', '18:00:00')
.reset_index()
)
Result
fruit time
0 apple 2021-12-20 17:55:00
2 apple 2021-12-20 17:55:00