I have 2 dataframes, one have a step of 30 minutes, the other one have a step of 1 hour.
I want to compare the 2 dataframe, and the first (freq 30 min) is in kiloWatt by hour, but on 2 line of 30 minutes.
Like this :
2021-05-01 03:00:00, 0.701
2021-05-01 03:30:00, 1.556
2021-05-01 04:00:00, 0.799
2021-05-01 04:30:00, 0.738
But in real life, this should have the form :
2021-05-01 03:00:00, 2.347
2021-05-01 04:00:00, 1.537
Line 1 and 2 are summed and line 3 and 4 are summed and there is only one measure by hour to have effectively kilowatt by hour.
How to sum each 2 line in the dataframe ?
CodePudding user response:
Assuming date
and value
the column names.
Aggregation on position:
out = df.groupby(df.index//2).agg({'date': 'first', 'value': 'sum'})
Aggregation per hour:
df['date'] = pd.to_datetime(df['date'])
out = (df.groupby(df['date'].dt.floor('h'), as_index=False)
.agg({'date': 'first', 'value': 'sum'})
)
output:
date value
0 2021-05-01 03:00:00 2.257
1 2021-05-01 04:00:00 1.537
used input:
date value
0 2021-05-01 03:00:00 0.701
1 2021-05-01 03:30:00 1.556
2 2021-05-01 04:00:00 0.799
3 2021-05-01 04:30:00 0.738
date as index:
df.index = pd.to_datetime(df.index)
out = df.groupby(df.index.floor('h')).sum().reset_index()
or:
df.index = pd.to_datetime(df.index)
out = (df
.assign(date=df.index).groupby(df.index.floor('h'), as_index=False)
.agg({'date': 'first', 'value': 'sum'})
)
CodePudding user response:
You can try resample
then sum
df['time'] = pd.to_datetime(df['time'])
out = df.resample('H', on='time')[['val']].sum().reset_index()
print(out)
time val
0 2021-05-01 03:00:00 2.257
1 2021-05-01 04:00:00 1.537