I have a dataset where the date_time column was separated into date and time. This is so date could be used separately from time in different scenarios. But now I need to get the time values between 5:00 - 8:00. I only find functions in pandas for datetimes. Is there any way to ONLY get values from a time column?
I think part of the issue is the data type for the time column. I have tried to remove the colon in the time value, so that 5:00 becomes 500. But I still am unable to choose the values I need. I keep getting a Key error on 'time'.
Here is what I tried so far:
# Get bird sightings between 5-8am. Remove the colon in time first.
early_birds_df = france_df['time'].str.replace(':','')
# Convert time to a numeric data type, so we can treat it like a number
early_birds_df['time'] = pd.to_numeric(early_birds_df['time'], errors='coerce')
early_birds_df.head()
But this returns an error:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 try:
-> 2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'time'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
-> 2900 raise KeyError(key) from err
2901
2902 if tolerance is not None:
KeyError: 'time'
Here is a code snippet to use as an example. I want to use the `time' column and it has an index of sorts. Let's say I want to retrieve all rows that are between the times of 1:00 and 3:10. What code can I use to do that?
date time
1 8/15/2013 0:18
2 8/15/2013 0:48
3 8/15/2013 1:17
4 8/15/2013 1:47
5 8/15/2013 2:17
6 8/15/2013 2:47
7 8/15/2013 3:02
8 8/15/2013 3:17
9 8/15/2013 3:32
10 8/15/2013 3:47
CodePudding user response:
If the times are between hourly values, then you can use (for your example of 5:00 and 8:00)
df[df["date_time"].dt.hour.between(5,8)]
To be more general you can use pandas.DatetimeIndex.indexer_between_time
but this requires converting your timestamp series to a DatetimeIndex
first, i.e.
df["date_time"].iloc[pd.DatetimeIndex(df["date_time"]).indexer_between_time("05:00", "08:00")]
or you can convert the times to their corresponding timedeltas since the start of the day, and then compare against timedelta values, eg
time = df["date_time"] - df["date_time"].dt.floor("D")
df[time.between(pd.Timedelta("05:00:00"), pd.Timedelta("08:00:00"))]
edit
Just saw the new data format with time
column. In that case you can append seconds to the strings so that we can work with to_timedelta
, eg
pd.to_timedelta(df["time"] ":00").between(pd.to_timedelta("05:00:00"), pd.to_timedelta("08:00:00"))