Home > Blockchain >  How to get values in a pandas dataframe column between 2 times?
How to get values in a pandas dataframe column between 2 times?

Time:11-12

I have a dataset where the date_time column was separated into date and time. This is so date could be used separately from time in different scenarios. But now I need to get the time values between 5:00 - 8:00. I only find functions in pandas for datetimes. Is there any way to ONLY get values from a time column?

I think part of the issue is the data type for the time column. I have tried to remove the colon in the time value, so that 5:00 becomes 500. But I still am unable to choose the values I need. I keep getting a Key error on 'time'.

Here is what I tried so far:

# Get bird sightings between 5-8am. Remove the colon in time first.
early_birds_df = france_df['time'].str.replace(':','')

# Convert time to a numeric data type, so we can treat it like a number
early_birds_df['time'] = pd.to_numeric(early_birds_df['time'], errors='coerce')
early_birds_df.head()

But this returns an error:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'time'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:

KeyError: 'time'

Here is a code snippet to use as an example. I want to use the `time' column and it has an index of sorts. Let's say I want to retrieve all rows that are between the times of 1:00 and 3:10. What code can I use to do that?

        date        time
1       8/15/2013   0:18
2       8/15/2013   0:48
3       8/15/2013   1:17
4       8/15/2013   1:47
5       8/15/2013   2:17
6       8/15/2013   2:47
7       8/15/2013   3:02
8       8/15/2013   3:17
9       8/15/2013   3:32
10      8/15/2013   3:47

CodePudding user response:

If the times are between hourly values, then you can use (for your example of 5:00 and 8:00)

df[df["date_time"].dt.hour.between(5,8)]

To be more general you can use pandas.DatetimeIndex.indexer_between_time but this requires converting your timestamp series to a DatetimeIndex first, i.e.

df["date_time"].iloc[pd.DatetimeIndex(df["date_time"]).indexer_between_time("05:00", "08:00")]

or you can convert the times to their corresponding timedeltas since the start of the day, and then compare against timedelta values, eg

time = df["date_time"] - df["date_time"].dt.floor("D")
df[time.between(pd.Timedelta("05:00:00"), pd.Timedelta("08:00:00"))]

edit

Just saw the new data format with time column. In that case you can append seconds to the strings so that we can work with to_timedelta, eg

pd.to_timedelta(df["time"]   ":00").between(pd.to_timedelta("05:00:00"), pd.to_timedelta("08:00:00"))
  • Related