Home > Net >  Sum hourly values between 2 dates in pandas
Sum hourly values between 2 dates in pandas

Time:04-27

I have a df like this:

                    DATE   PP
0     2011-12-20 07:00:00  0.0
1     2011-12-20 08:00:00  0.0
2     2011-12-20 09:00:00  2.0
3     2011-12-20 10:00:00  0.0
4     2011-12-20 11:00:00  0.0
5     2011-12-20 12:00:00  0.0
6     2011-12-20 13:00:00  0.0
7     2011-12-20 14:00:00  5.0
8     2011-12-20 15:00:00  0.0
9     2011-12-20 16:00:00  0.0
10    2011-12-20 17:00:00  2.0
11    2011-12-20 18:00:00  0.0
12    2011-12-20 19:00:00  0.0
13    2011-12-20 20:00:00  1.0
14    2011-12-20 21:00:00  0.0
15    2011-12-20 22:00:00  0.0
16    2011-12-20 23:00:00  0.0
17    2011-12-21 00:00:00  0.0
18    2011-12-21 01:00:00  3.0
19    2011-12-21 02:00:00  0.0
20    2011-12-21 03:00:00  0.0
21    2011-12-21 04:00:00  0.0
22    2011-12-21 05:00:00  0.0
23    2011-12-21 06:00:00  5.0
24    2011-12-21 07:00:00  0.0
...   ....       ...       ...
75609 2020-08-05 16:00:00  0.0
75610 2020-08-05 19:00:00  0.0

[75614 rows x 2 columns]

I want the cumulative values of PP column between 2 specific hourly dates in different days. I want the sum of every 07:00:00 from one day to the 07:00:00 of the next day. For example i want the cumulative values of PP from 2011-12-20 07:00:00 to 2011-12-21 07:00:00:

Expected result:

   DATE        CUMULATIVE VALUES PP
0  2011-12-20    18
1  2011-12-21    5
2  2011-12-22    10
etc... etc...    ...

I tried this:

df['DAY'] = df['DATE'].dt.strftime('%d')
cumulatives=pd.DataFrame(df.groupby(['DAY'])['PP'].sum())

But this only sums the entire day, not between 7:00:00 to 7:00:00 of days.

Data:

{'DATE': ['2011-12-20 07:00:00', '2011-12-20 08:00:00', '2011-12-20 09:00:00', 
          '2011-12-20 10:00:00', '2011-12-20 11:00:00', '2011-12-20 12:00:00', 
          '2011-12-20 13:00:00', '2011-12-20 14:00:00', '2011-12-20 15:00:00', 
          '2011-12-20 16:00:00', '2011-12-20 17:00:00', '2011-12-20 18:00:00', 
          '2011-12-20 19:00:00', '2011-12-20 20:00:00', '2011-12-20 21:00:00', 
          '2011-12-20 22:00:00', '2011-12-20 23:00:00', '2011-12-21 00:00:00', 
          '2011-12-21 01:00:00', '2011-12-21 02:00:00', '2011-12-21 03:00:00',
          '2011-12-21 04:00:00', '2011-12-21 05:00:00', '2011-12-21 06:00:00', 
          '2011-12-21 07:00:00', '2020-08-05 16:00:00', '2020-08-05 19:00:00'], 
 'PP': [0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 2.0, 0.0, 0.0, 1.0, 
        0.0, 0.0, 0.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0, 0.0]}

CodePudding user response:

One way is to subtract 7hours from date so that each day starts at 17:00 of the previous day; then groupby.sum fetches the desired output:

df['DATE'] = pd.to_datetime(df['DATE'])
out = df.groupby(df['DATE'].sub(pd.to_timedelta('7h')).dt.date)['PP'].sum().reset_index(name='SUM')

Output:

         DATE   SUM
0  2011-12-20  18.0
1  2011-12-21   0.0
2  2020-08-05   0.0
  • Related