Home > Software design >  Using pandas.Grouper to split datetime.time column into time ranges
Using pandas.Grouper to split datetime.time column into time ranges

Time:10-27

I am reading from an Excel file that has a column with times. Since I can't upload the actual file, I created the variable timeIntervals to illustrate.

When I run this code...

import pandas as pd
import datetime
from pyPython import *

def main():
    timeIntervals = pd.date_range("11:00", "21:30", freq="30min").time
    df = pd.DataFrame({"Times": timeIntervals})
    grp = pd.Grouper(key="Times", freq="3H")
    value = df.groupby(grp).count()
    print(value)

if __name__ == '__main__':
    main()

I get the following error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

How can I use pandas.Grouper in combination with DataFrame.groupby to "group" dataframe df into discrete time ranges (3 hours) ? Are there other alternatives?

CodePudding user response:

A few issues:

  1. A date_range cannot be reduced to just time only without losing the required datatype for resampling on time window.
  2. count counts the non-NaN values in a column so one must be provided since there are no remaining columns in the sample frame.

We can fix the first issue by turning the time column into a datetime:

timeIntervals = pd.date_range("11:00", "21:30", freq="30min") # remove time here
df = pd.DataFrame({"Times": timeIntervals})

If we are not creating these values from a date_range we can simply convert the column to_datetime:

df['Times'] = pd.to_datetime(df['Times'], format='%H:%M:%S')

Then we can groupby and count:

value = df.groupby(pd.Grouper(key="Times", freq="3H"))['Times'].count()

If needed we can update the index to only reflect the time after grouping:

value.index = value.index.time

As a result value becomes:

09:00:00    2
12:00:00    6
15:00:00    6
18:00:00    6
21:00:00    2
Name: Times, dtype: int64

All together with to_datetime:

def main():
    time_intervals = pd.date_range("11:00", "21:30", freq="30min").time
    df = pd.DataFrame({"Times": time_intervals})
    # Convert to DateTime
    df['Times'] = pd.to_datetime(df['Times'], format='%H:%M:%S')
    # Group and count specific column
    value = df.groupby(pd.Grouper(key="Times", freq="3H"))['Times'].count()
    # Retrieve only Time information
    value.index = value.index.time
    print(value)

Or without retrieving time before DataFrame creation:

def main():
    time_intervals = pd.date_range("11:00", "21:30", freq="30min")
    df = pd.DataFrame({"Times": time_intervals})
    value = df.groupby(pd.Grouper(key="Times", freq="3H"))['Times'].count()
    value.index = value.index.time
    print(value)
  • Related