My dataframe is presented above. The dtypes are
weekday int64
date datetime64[ns]
time object
customers int64
dtype: object
I'd like to sum the customers column to be the count of customers arrived in the past 2 hours (stored in column date). However, using the Pandas Rolling functionality, I can only write
df['customers'] = df['date'].rolling(2).count()
This only counts the previous two date rows completely disregarding datetime values. I'd like to write
df['customers'] = df['date'].rolling('2H').count() #desired: 2H
to get the correct result. However, I'm getting ValueError: window must be an integer
. Reading the rolling documentation from pandas, a datetime object should be able to receive a rolling time window (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html). I'm completely clueless why my datetime column cannot use this functionality.
CodePudding user response:
Create sorted DatetimeIndex
:
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').sort_index()
df['customers'] = df['customers'].rolling('2H').count()