I am reading from an Excel file that has a column with times. Since I can't upload the actual file, I created the variable timeIntervals
to illustrate.
When I run this code...
import pandas as pd
import datetime
from pyPython import *
def main():
timeIntervals = pd.date_range("11:00", "21:30", freq="30min").time
df = pd.DataFrame({"Times": timeIntervals})
grp = pd.Grouper(key="Times", freq="3H")
value = df.groupby(grp).count()
print(value)
if __name__ == '__main__':
main()
I get the following error:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
How can I use pandas.Grouper
in combination with DataFrame.groupby
to "group" dataframe df
into discrete time ranges (3 hours) ? Are there other alternatives?
CodePudding user response:
A few issues:
- A
date_range
cannot be reduced to justtime
only without losing the required datatype for resampling on time window. count
counts the non-NaN values in a column so one must be provided since there are no remaining columns in the sample frame.
We can fix the first issue by turning the time column into a datetime:
timeIntervals = pd.date_range("11:00", "21:30", freq="30min") # remove time here
df = pd.DataFrame({"Times": timeIntervals})
If we are not creating these values from a date_range
we can simply convert the column to_datetime
:
df['Times'] = pd.to_datetime(df['Times'], format='%H:%M:%S')
Then we can groupby and count:
value = df.groupby(pd.Grouper(key="Times", freq="3H"))['Times'].count()
If needed we can update the index
to only reflect the time
after grouping:
value.index = value.index.time
As a result value
becomes:
09:00:00 2
12:00:00 6
15:00:00 6
18:00:00 6
21:00:00 2
Name: Times, dtype: int64
All together with to_datetime
:
def main():
time_intervals = pd.date_range("11:00", "21:30", freq="30min").time
df = pd.DataFrame({"Times": time_intervals})
# Convert to DateTime
df['Times'] = pd.to_datetime(df['Times'], format='%H:%M:%S')
# Group and count specific column
value = df.groupby(pd.Grouper(key="Times", freq="3H"))['Times'].count()
# Retrieve only Time information
value.index = value.index.time
print(value)
Or without retrieving time
before DataFrame creation:
def main():
time_intervals = pd.date_range("11:00", "21:30", freq="30min")
df = pd.DataFrame({"Times": time_intervals})
value = df.groupby(pd.Grouper(key="Times", freq="3H"))['Times'].count()
value.index = value.index.time
print(value)