Home > Net >  Python dataframe sum each 2 line
Python dataframe sum each 2 line

Time:06-13

I have 2 dataframes, one have a step of 30 minutes, the other one have a step of 1 hour.

I want to compare the 2 dataframe, and the first (freq 30 min) is in kiloWatt by hour, but on 2 line of 30 minutes.

Like this :

2021-05-01 03:00:00, 0.701
2021-05-01 03:30:00, 1.556
2021-05-01 04:00:00, 0.799
2021-05-01 04:30:00, 0.738

But in real life, this should have the form :

2021-05-01 03:00:00, 2.347
2021-05-01 04:00:00, 1.537

Line 1 and 2 are summed and line 3 and 4 are summed and there is only one measure by hour to have effectively kilowatt by hour.

How to sum each 2 line in the dataframe ?

CodePudding user response:

Assuming date and value the column names.

Aggregation on position:

out = df.groupby(df.index//2).agg({'date': 'first', 'value': 'sum'})

Aggregation per hour:

df['date'] = pd.to_datetime(df['date'])
out = (df.groupby(df['date'].dt.floor('h'), as_index=False)
         .agg({'date': 'first', 'value': 'sum'})
       )

output:

                 date  value
0 2021-05-01 03:00:00  2.257
1 2021-05-01 04:00:00  1.537

used input:

                 date  value
0 2021-05-01 03:00:00  0.701
1 2021-05-01 03:30:00  1.556
2 2021-05-01 04:00:00  0.799
3 2021-05-01 04:30:00  0.738

date as index:

df.index = pd.to_datetime(df.index)
out = df.groupby(df.index.floor('h')).sum().reset_index()

or:

df.index = pd.to_datetime(df.index)
out = (df
 .assign(date=df.index).groupby(df.index.floor('h'), as_index=False)
 .agg({'date': 'first', 'value': 'sum'})
)

CodePudding user response:

You can try resample then sum

df['time'] = pd.to_datetime(df['time'])

out = df.resample('H', on='time')[['val']].sum().reset_index()
print(out)

                 time    val
0 2021-05-01 03:00:00  2.257
1 2021-05-01 04:00:00  1.537
  • Related