Home > Blockchain >  How to aggregate values in Pandas?
How to aggregate values in Pandas?

Time:12-24

I have a pandas dataframe that you can see in this screenshot:

It has a time resolution of 1 minute. I would like to aggregate the first 5 columns and calculate the rolling average and for the 5th column ("Availability of the EV") I would just like to take the value every 5 minutes. For this I have the following code:

import pandas as pd


originalData = pd.read_csv('C:/Users/Desktop/Test_1Minute.csv', sep =";")
originalData['Time'] = pd.to_datetime(originalData['Time'], format = '%d.%m.%Y %H:%M')

adjustedData = originalData.set_index('Time').asfreq('5Min')
#Aggregate
adjustedData = originalData['Space Heating [W]'].resample('5Min').mean()
adjustedData['Electricity [W]'] = adjustedData['Electricity [W]'].resample('5Min').mean()
adjustedData['PV [nominal]'] = adjustedData['PV [nominal]'].resample('5Min').mean()
adjustedData['DHW [W]'] = adjustedData['DHW [W]'].resample('5Min').mean()

#Write new dataframe to csv 
adjustedData.to_csv('C:/Users/Desktop/Test_5Minutes.csv', sep =";") 

enter image description here

When I run this code I get the following error

"TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'"

Do you know why this issue occurs and how to solve it?

CodePudding user response:

The problem with your provided code is you set the index on a new dataframe adjustedData, but then resample the originalData. Below is the original resampled code (which is not a rolling mean!), that aggregates all columns; and an actual rolling mean for the first columns, retaining the original values for the last column.

import pandas as pd


originalData = pd.read_csv('C:/Users/Desktop/Test_1mintue.csv', sep =";"), sep =";")
originalData['Time'] = pd.to_datetime(originalData['Time'], format = '%d.%m.%Y %H:%M')

# grouping by 5 min windows:
adjustedData = originalData.set_index('Time').resample("5Min").mean()

# using rolling mean for specific columns
mean_cols = ['Space Heating [W]', 'Electricity [W]','PV [nominal]', 'DHW [W]']
adjustedData = originalData.set_index('Time').assign(**{col: lambda dataf: dataf[col].rolling('5Min').mean() for col in mean_cols})

#Write new dataframe to csv 
adjustedData.to_csv('C:/Users/Desktop/Test_5mintues.csv', sep =";") 
  • Related