Home > OS >  Pandas sum (values from minutes to hour) time data values with Pandas
Pandas sum (values from minutes to hour) time data values with Pandas

Time:12-12

I have a question: I want to sum time data values with Pandas. I have data in 15 minutes step and I want to sum the values for one hour.

For example: My Data

Index Date Time Value
0 11.06.2021 04:00 1125.6
1 11.06.2021 04:15 5622.2
2 11.06.2021 04:30 3222.6
3 11.06.2021 04:45 2666.7
4 11.06.2021 05:00 4657.2
5 11.06.2021 05:15 2946.8
6 11.06.2021 05:30 3255.4
7 11.06.2021 05:45 ...
8 11.06.2021 06:00 ...
9 11.06.2021 06:15 ...
10 11.06.2021 06:30 ...
... ... ... ...

I want to have:

Index Date Time Value
0 11.06.2021 4:00 The sum from 4:00 to 4:45
1 11.06.2021 5:00 ...
2 11.06.2021 6:00 ...

This data is called test:

My soulution:

result = []
result1 = []
counter11 = 0
for index, row in test.iterrows():
    counter11  = 1
    print(counter11)
    result1  = test.values[index]
    result  = result1
    if counter11 == 3:
        result.add(result.values)
        result = 0
        counter = 0
test["sum"] = result

I would be happy if somebody can help me. Thank you.

CodePudding user response:

Using:

  • concatenate Date and Time columns
  • pandas.to_datetime to convert to datetime
  • set_index to the Datetime
  • resample for hours
  • sum with min_count=1 to o distinguish when there is no value for the period.
import pandas as pd

# creating the dataframe
import io
data = """
Index,Date,Time,Value
0,11.06.2021,04:00,1125.6
1,11.06.2021,04:15,5622.2
2,11.06.2021,04:30,3222.6
3,11.06.2021,04:45,2666.7
4,11.06.2021,05:00,4657.2
5,11.06.2021,05:15,2946.8
6,11.06.2021,05:30,3255.4
7,11.06.2021,07:30,1111.1
"""
df = pd.read_csv(io.StringIO(data), sep=',', usecols=['Date', 'Time', 'Value'])

# processing
df['Datetime']= pd.to_datetime(df['Date'] ' ' df['Time'])
df.drop(['Date', 'Time'], axis=1, inplace= True)
df.set_index('Datetime', inplace= True)
print(df.resample('H').sum(min_count=1))

# output
                       Value
Datetime                    
2021-11-06 04:00:00  12637.1
2021-11-06 05:00:00  10859.4
2021-11-06 06:00:00      NaN
2021-11-06 07:00:00   1111.1

CodePudding user response:

if you are using pandas I would like to make two suggestions.

First, you should make use of pandas builtin datetime, you can find everything about it in the docs: https://pandas.pydata.org/docs/user_guide/timeseries.html?highlight=datetime

This will make your life much easier when working with dates and times. You can use the datetime objects to create and index as in my code example below.

The second suggestions is, that once you are making use of pandas datetime, you can actually use other builtin functions such as resampling. This will 'group' the data in your dataframe based on some logic you provide to the function, such as resampling it by hour. Then you can apply functions to that resampled data such as sum().

import pandas as pd
#This is just for making this example, you will not need numpy
import numpy as np

"This creates a datetime index"
idx = pd.date_range("2018-01-01", periods=70, freq="T")

"Creating a sample dataframe to work with"
df = pd.DataFrame(np.repeat(1,70), index=idx, columns= ["Data"])

"""
This is what you are trying to do:
Resampling or 'grouping' the data by hours (in this case) and sums all
values for each hour.
"""
df.resample("H").sum()
  • Related