I have a 30 minutes interval of the data and I want to change them to houlry data. I have date, id and the value of the data in one data frame. For example, in the following data frame: I have value for id=1 at 8 and 8:30, so I sum the val (10 10) for this id and keep the first time. My time shoud be integer, 8,9,10,11,12,13,14 and ... For id=2, I have the value for 8:00 and 9:00, so will keep those two. For id= 3, I have three values for one day, and I have another value for another day. So, for the first day I sum the values of 8:00 and 8:30. and keep the values for 9:00 and for the next day, I chose it values. Also for the id=4 we have different times and we do same as before. Here is the df:
import pandas as pd
df = pd.DataFrame()
df['id'] = [1, 1, 2, 2,
3,3,3,3,4,4,4,4,4,4]
df['date'] = ['2012-02-10
8:00:00','2012-02-10 8:30:00','2012-
02-10 8:00:00','2012-02-
10 9:00:00','2012-02-10
8:00:00', '2012-02-10
8:30:00', '2012-02-10
9:00:00','2012-02-11
9:30:00','2012-02-11
9:00:00','2012-02-11
9:30:00','2012-02-11
10:00:00','2012-02-11
11:00:00','2012-02-11
11:30:00','2012-02-11
12:00:00' ]
df [ 'val'] = [10,10, 2,
2, 3,3, 4, 5,1,2,3,4,5,6]
And here is the output:
id date val
1 2012-02-10 8:00:00 20
2 2012-02-10 8:00:00 2
2 2012-02-10 9:00:00 2
3 2012-02-10 8:00:00 6
3 2012-02-10 9:00:00 4
3 2012-02-11 9:30:00 5
4 2012-02-11 9:00:00 3
4 2012-02-11 10:00:00 3
4 2012-02-11 11:00:00 9
4 2012-02-11 12:00:00 6
CodePudding user response:
You can achieve the result as follows.
- First, let's use
pd.to_datetime
to turndf['date']
into a pandas datetime object. - Next, we apply
Series.dt.floor
to reduce each timestamp to full hours. - We want to combine the resulting series with
id
insidedf.groupby
to group our data. - Finally, we apply
DataFrameGroupBy.agg
to retrieve the first value from the originaldate
column, and the sum from the columnval
.
out = df.copy()
out['date'] = pd.to_datetime(out['date'])
out = out.groupby(['id',out['date'].\
dt.floor(freq='H')], as_index=False)\
.agg({'date':'first','val':'sum'})
print(out)
id date val
0 1 2012-02-10 08:00:00 20
1 2 2012-02-10 08:00:00 2
2 2 2012-02-10 09:00:00 2
3 3 2012-02-10 08:00:00 6
4 3 2012-02-10 09:00:00 4
5 3 2012-02-11 09:30:00 5
6 4 2012-02-11 09:00:00 3
7 4 2012-02-11 10:00:00 3
8 4 2012-02-11 11:00:00 9
9 4 2012-02-11 12:00:00 6