Home > Software design >  Change the 30 minutes of data to hourly data
Change the 30 minutes of data to hourly data

Time:09-04

I have a 30 minutes interval of the data and I want to change them to houlry data. I have date, id and the value of the data in one data frame. For example, in the following data frame: I have value for id=1 at 8 and 8:30, so I sum the val (10 10) for this id and keep the first time. My time shoud be integer, 8,9,10,11,12,13,14 and ... For id=2, I have the value for 8:00 and 9:00, so will keep those two. For id= 3, I have three values for one day, and I have another value for another day. So, for the first day I sum the values of 8:00 and 8:30. and keep the values for 9:00 and for the next day, I chose it values. Also for the id=4 we have different times and we do same as before. Here is the df:

import pandas as pd
df = pd.DataFrame()
df['id'] = [1, 1, 2, 2, 
 3,3,3,3,4,4,4,4,4,4]
df['date'] = ['2012-02-10 
 8:00:00','2012-02-10             8:30:00','2012- 
 02-10 8:00:00','2012-02- 
 10 9:00:00','2012-02-10 
 8:00:00', '2012-02-10 
 8:30:00', '2012-02-10 
 9:00:00','2012-02-11 
 9:30:00','2012-02-11 
 9:00:00','2012-02-11 
 9:30:00','2012-02-11 
 10:00:00','2012-02-11 
 11:00:00','2012-02-11 
 11:30:00','2012-02-11 
 12:00:00' ]
df [ 'val'] = [10,10, 2, 
2, 3,3, 4, 5,1,2,3,4,5,6]

And here is the output:

id      date          val               
1    2012-02-10 8:00:00    20
2    2012-02-10 8:00:00    2
2    2012-02-10 9:00:00    2
3    2012-02-10 8:00:00    6
3    2012-02-10 9:00:00    4
3    2012-02-11 9:30:00    5
4    2012-02-11 9:00:00    3
4    2012-02-11 10:00:00    3
4    2012-02-11 11:00:00    9
4    2012-02-11 12:00:00    6

CodePudding user response:

You can achieve the result as follows.

  • First, let's use pd.to_datetime to turn df['date'] into a pandas datetime object.
  • Next, we apply Series.dt.floor to reduce each timestamp to full hours.
  • We want to combine the resulting series with id inside df.groupby to group our data.
  • Finally, we apply DataFrameGroupBy.agg to retrieve the first value from the original date column, and the sum from the column val.
out = df.copy()
out['date'] = pd.to_datetime(out['date'])
out = out.groupby(['id',out['date'].\
                   dt.floor(freq='H')], as_index=False)\
    .agg({'date':'first','val':'sum'})

print(out)

   id                date  val
0   1 2012-02-10 08:00:00   20
1   2 2012-02-10 08:00:00    2
2   2 2012-02-10 09:00:00    2
3   3 2012-02-10 08:00:00    6
4   3 2012-02-10 09:00:00    4
5   3 2012-02-11 09:30:00    5
6   4 2012-02-11 09:00:00    3
7   4 2012-02-11 10:00:00    3
8   4 2012-02-11 11:00:00    9
9   4 2012-02-11 12:00:00    6
  • Related