I have a question: I want to sum time data values with Pandas. I have data in 15 minutes step and I want to sum the values for one hour.
For example: My Data
Index | Date | Time | Value |
---|---|---|---|
0 | 11.06.2021 | 04:00 | 1125.6 |
1 | 11.06.2021 | 04:15 | 5622.2 |
2 | 11.06.2021 | 04:30 | 3222.6 |
3 | 11.06.2021 | 04:45 | 2666.7 |
4 | 11.06.2021 | 05:00 | 4657.2 |
5 | 11.06.2021 | 05:15 | 2946.8 |
6 | 11.06.2021 | 05:30 | 3255.4 |
7 | 11.06.2021 | 05:45 | ... |
8 | 11.06.2021 | 06:00 | ... |
9 | 11.06.2021 | 06:15 | ... |
10 | 11.06.2021 | 06:30 | ... |
... | ... | ... | ... |
I want to have:
Index | Date | Time | Value |
---|---|---|---|
0 | 11.06.2021 | 4:00 | The sum from 4:00 to 4:45 |
1 | 11.06.2021 | 5:00 | ... |
2 | 11.06.2021 | 6:00 | ... |
This data is called test
:
My soulution:
result = []
result1 = []
counter11 = 0
for index, row in test.iterrows():
counter11 = 1
print(counter11)
result1 = test.values[index]
result = result1
if counter11 == 3:
result.add(result.values)
result = 0
counter = 0
test["sum"] = result
I would be happy if somebody can help me. Thank you.
CodePudding user response:
Using:
- concatenate
Date
andTime
columns pandas.to_datetime
to convert to datetimeset_index
to the Datetimeresample
for hourssum
withmin_count=1
to o distinguish when there is no value for the period.
import pandas as pd
# creating the dataframe
import io
data = """
Index,Date,Time,Value
0,11.06.2021,04:00,1125.6
1,11.06.2021,04:15,5622.2
2,11.06.2021,04:30,3222.6
3,11.06.2021,04:45,2666.7
4,11.06.2021,05:00,4657.2
5,11.06.2021,05:15,2946.8
6,11.06.2021,05:30,3255.4
7,11.06.2021,07:30,1111.1
"""
df = pd.read_csv(io.StringIO(data), sep=',', usecols=['Date', 'Time', 'Value'])
# processing
df['Datetime']= pd.to_datetime(df['Date'] ' ' df['Time'])
df.drop(['Date', 'Time'], axis=1, inplace= True)
df.set_index('Datetime', inplace= True)
print(df.resample('H').sum(min_count=1))
# output
Value
Datetime
2021-11-06 04:00:00 12637.1
2021-11-06 05:00:00 10859.4
2021-11-06 06:00:00 NaN
2021-11-06 07:00:00 1111.1
CodePudding user response:
if you are using pandas I would like to make two suggestions.
First, you should make use of pandas builtin datetime, you can find everything about it in the docs: https://pandas.pydata.org/docs/user_guide/timeseries.html?highlight=datetime
This will make your life much easier when working with dates and times. You can use the datetime objects to create and index as in my code example below.
The second suggestions is, that once you are making use of pandas datetime, you can actually use other builtin functions such as resampling. This will 'group' the data in your dataframe based on some logic you provide to the function, such as resampling it by hour. Then you can apply functions to that resampled data such as sum().
import pandas as pd
#This is just for making this example, you will not need numpy
import numpy as np
"This creates a datetime index"
idx = pd.date_range("2018-01-01", periods=70, freq="T")
"Creating a sample dataframe to work with"
df = pd.DataFrame(np.repeat(1,70), index=idx, columns= ["Data"])
"""
This is what you are trying to do:
Resampling or 'grouping' the data by hours (in this case) and sums all
values for each hour.
"""
df.resample("H").sum()