Home > Mobile >  How to use pandas to get data in between certain times within a DataFrame
How to use pandas to get data in between certain times within a DataFrame

Time:12-10

Here is a small snippet of a dataFrame I am working with:

     fruit       time
0    apple       2021-12-20 17:55:00
1    bannana     2021-12-23 05:13:00
2    apple       2021-12-20 17:55:00

How can I go about getting data in between certain timestamps. Such as all data in between 17:00:00 and 18:00:00.

In addition if possible, I would like to get data in between certain timestamps who's fruit value equals "apple"

I have tried df.between_time but I get the error: TypeError: Index must be DatetimeIndex. Seems like the issue is with the timestamp formatting.

CodePudding user response:

Using .query() and .assign()

df["date"], df["time"] = zip(*[(str(x.date()), str(x.time())) for x in pd.to_datetime(df["time"])])
df = (df
      .query("time.between('17:00:00', '18:00:00', inclusive='both') & fruit.eq('apple')")
      .assign(time=pd.to_datetime(df["date"].str.cat(df["time"], sep=" ")))
      ).drop(columns="date")

print(df)

Output:

   fruit                time
0  apple 2021-12-20 17:55:00
2  apple 2021-12-20 17:55:00

CodePudding user response:

Solution 1: boolean indexing to filter the rows where hour is between 17 and 18:

df[df['fruit'].eq('apple') & df['time'].dt.hour.between(17, 18)]

Solution 2: Set the index to time column then use between_time to filter the rows

(
    df
    .set_index('time')
    .query("fruit == 'apple'")
    .between_time('17:00:00', '18:00:00')
    .reset_index()
)

Result

   fruit                time
0  apple 2021-12-20 17:55:00
2  apple 2021-12-20 17:55:00
  • Related