Home > Mobile >  Group data by Hourly basis in pandas
Group data by Hourly basis in pandas

Time:11-16

I have 3 columns and the data types of all 3 columns are object. First column is Date column and 3rd column is values

Something like this-

Date           Values    Country 

01/01/21 12:00   2.       India
01/01/21 12:15   4.      India 
01/01/21 12:30   6.       India
01/01/21 12:45   8.     India

01/01/21 1:00.    10.   India
01/01/21 1:15.    20.   India
01/01/21 1:30.    30.   India
01/01/21 1:45.   40.     India

Date is from past 1 year and there are 20000 records

I want to change the time from 15 minutes bucket to hourly bucket So that the output becomes something like this

Date.             Values.   

01/01/21 12:00.   5.     India

01/01/21 1:00.    25.     India

Values is the avg

Tried to resample this data but getting error Only valid with datetimeindex,TimedeltaIndex or PeriodIndex but got an instance of RangeIndex

CodePudding user response:

You can use resample with agg:

df['Date'] = pd.to_datetime(df['Date'])

out = (df.resample('1h', on='Date')
         .agg({'Values': 'mean', 'Country': 'first'}).dropna()
      )

Output:

                     Values Country
Date                               
2021-01-01 01:00:00    25.0   India
2021-01-01 12:00:00     5.0   India

Or groupby.agg after rounding the datetime with dt.floor:

out = (df.groupby(df['Date'].dt.floor('1h'), as_index=False)
         .agg({'Date': 'first', 'Values': 'mean', 'Country': 'first'})
       )

Output:

                 Date  Values Country
0 2021-01-01 01:00:00    25.0   India
1 2021-01-01 12:00:00     5.0   India

CodePudding user response:

Start by changing the Date column to datetime and DatetimeIndex by:

#convert Date column to datetime
df['Date'] = pd.to_datetime(df.Date)

#convert Date column to DateTimeIndex
dti = pd.DatetimeIndex(df.Date.values)

#set index to DateTimeIndex and drop the Date column
df = df.set_index(dti).drop('Date',axis=1)

#resample df to 1hr data, and aggregate by mean
df_resampled = (df.resample('1h')
       .agg({'Values': 'mean', 'Country': 'first'})
       .dropna()
      )

print(df_resampled):
                        Values  Country
2021-01-01 01:00:00     25.0    India
2021-01-01 12:00:00     5.0     India
  • Related