I have 3 columns and the data types of all 3 columns are object. First column is Date column and 3rd column is values
Something like this-
Date Values Country
01/01/21 12:00 2. India
01/01/21 12:15 4. India
01/01/21 12:30 6. India
01/01/21 12:45 8. India
01/01/21 1:00. 10. India
01/01/21 1:15. 20. India
01/01/21 1:30. 30. India
01/01/21 1:45. 40. India
Date is from past 1 year and there are 20000 records
I want to change the time from 15 minutes bucket to hourly bucket So that the output becomes something like this
Date. Values.
01/01/21 12:00. 5. India
01/01/21 1:00. 25. India
Values is the avg
Tried to resample this data but getting error Only valid with datetimeindex,TimedeltaIndex or PeriodIndex but got an instance of RangeIndex
CodePudding user response:
You can use resample
with agg
:
df['Date'] = pd.to_datetime(df['Date'])
out = (df.resample('1h', on='Date')
.agg({'Values': 'mean', 'Country': 'first'}).dropna()
)
Output:
Values Country
Date
2021-01-01 01:00:00 25.0 India
2021-01-01 12:00:00 5.0 India
Or groupby.agg
after rounding the datetime with dt.floor
:
out = (df.groupby(df['Date'].dt.floor('1h'), as_index=False)
.agg({'Date': 'first', 'Values': 'mean', 'Country': 'first'})
)
Output:
Date Values Country
0 2021-01-01 01:00:00 25.0 India
1 2021-01-01 12:00:00 5.0 India
CodePudding user response:
Start by changing the Date
column to datetime
and DatetimeIndex
by:
#convert Date column to datetime
df['Date'] = pd.to_datetime(df.Date)
#convert Date column to DateTimeIndex
dti = pd.DatetimeIndex(df.Date.values)
#set index to DateTimeIndex and drop the Date column
df = df.set_index(dti).drop('Date',axis=1)
#resample df to 1hr data, and aggregate by mean
df_resampled = (df.resample('1h')
.agg({'Values': 'mean', 'Country': 'first'})
.dropna()
)
print(df_resampled):
Values Country
2021-01-01 01:00:00 25.0 India
2021-01-01 12:00:00 5.0 India