I have a data frame with the date/time passed as "parse_dates" and then set as the index column for the data frame.
Flow Enter Leave kWE kW
Time
2021-08-30 08:50:00 24.22 14.0 5.7 66.56 619.478012
2021-08-30 08:51:00 24.22 14.0 5.7 66.56 619.478012
I want to get the average values of columns 'kWE' and 'Flow' per hour and day though i'm unsure of how to extract the data when the date/time is set as the index.
CodePudding user response:
You can use resample
with hourly (h
) frequency:
cols = ['Flow', 'kWE']
df.resample('h')[cols].mean()
output:
Flow kWE
Time
2021-08-30 08:00:00 24.22 66.56
Or using origin='start'
to use the first value as starting point.
Note the different index. Here the result is the same but this might give a different output on larger data depending on how rows will cluster:
df.resample('h', origin='start')[cols].mean()
output:
Flow kWE
Time
2021-08-30 08:50:00 24.22 66.56
CodePudding user response:
Use:
#Preparing data
string = """Time Flow Enter Leave kWE kW
2021-08-30 08:50:00 24.22 14.0 5.7 66.56 619.478012
2021-08-30 08:51:00 24.22 14.0 5.7 66.56 619.478012"""
data = [x.split(' ') for x in string.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])
df['Time']=pd.to_datetime(df['Time'])
df = df.set_index('Time')
#Solution, daily and hourly averages:
df.groupby(df.index.day)[['kWE', 'Flow']].mean()
df.groupby(df.index.hour)[['kWE', 'Flow']].mean()