Pandas sum (values from minutes to hour) time data values with Pandas-CodePudding

I have a question: I want to sum time data values with Pandas. I have data in 15 minutes step and I want to sum the values for one hour.

For example: My Data

Index	Date	Time	Value
0	11.06.2021	04:00	1125.6
1	11.06.2021	04:15	5622.2
2	11.06.2021	04:30	3222.6
3	11.06.2021	04:45	2666.7
4	11.06.2021	05:00	4657.2
5	11.06.2021	05:15	2946.8
6	11.06.2021	05:30	3255.4
7	11.06.2021	05:45	...
8	11.06.2021	06:00	...
9	11.06.2021	06:15	...
10	11.06.2021	06:30	...
...	...	...	...

I want to have:

Index	Date	Time	Value
0	11.06.2021	4:00	The sum from 4:00 to 4:45
1	11.06.2021	5:00	...
2	11.06.2021	6:00	...

This data is called test:

My soulution:

result = []
result1 = []
counter11 = 0
for index, row in test.iterrows():
    counter11  = 1
    print(counter11)
    result1  = test.values[index]
    result  = result1
    if counter11 == 3:
        result.add(result.values)
        result = 0
        counter = 0
test["sum"] = result

I would be happy if somebody can help me. Thank you.

CodePudding user response：

Using:

concatenate Date and Time columns
pandas.to_datetime to convert to datetime
set_index to the Datetime
resample for hours
sum with min_count=1 to o distinguish when there is no value for the period.

import pandas as pd

# creating the dataframe
import io
data = """
Index,Date,Time,Value
0,11.06.2021,04:00,1125.6
1,11.06.2021,04:15,5622.2
2,11.06.2021,04:30,3222.6
3,11.06.2021,04:45,2666.7
4,11.06.2021,05:00,4657.2
5,11.06.2021,05:15,2946.8
6,11.06.2021,05:30,3255.4
7,11.06.2021,07:30,1111.1
"""
df = pd.read_csv(io.StringIO(data), sep=',', usecols=['Date', 'Time', 'Value'])

# processing
df['Datetime']= pd.to_datetime(df['Date'] ' ' df['Time'])
df.drop(['Date', 'Time'], axis=1, inplace= True)
df.set_index('Datetime', inplace= True)
print(df.resample('H').sum(min_count=1))

# output
                       Value
Datetime                    
2021-11-06 04:00:00  12637.1
2021-11-06 05:00:00  10859.4
2021-11-06 06:00:00      NaN
2021-11-06 07:00:00   1111.1

CodePudding user response：

if you are using pandas I would like to make two suggestions.

First, you should make use of pandas builtin datetime, you can find everything about it in the docs: https://pandas.pydata.org/docs/user_guide/timeseries.html?highlight=datetime

This will make your life much easier when working with dates and times. You can use the datetime objects to create and index as in my code example below.

The second suggestions is, that once you are making use of pandas datetime, you can actually use other builtin functions such as resampling. This will 'group' the data in your dataframe based on some logic you provide to the function, such as resampling it by hour. Then you can apply functions to that resampled data such as sum().

import pandas as pd
#This is just for making this example, you will not need numpy
import numpy as np

"This creates a datetime index"
idx = pd.date_range("2018-01-01", periods=70, freq="T")

"Creating a sample dataframe to work with"
df = pd.DataFrame(np.repeat(1,70), index=idx, columns= ["Data"])

"""
This is what you are trying to do:
Resampling or 'grouping' the data by hours (in this case) and sums all
values for each hour.
"""
df.resample("H").sum()