Here's some made up time series data on 1 minute intervals:
import pandas as pd
import numpy as np
import random
random.seed(5)
rows,cols = 8760,3
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='1T')
df = pd.DataFrame(data, columns=['condition1','condition2','condition3'], index=tidx)
This is just some code to create some Boolean columns
df['condition1_bool'] = df['condition1'].lt(.1)
df['condition2_bool'] = df['condition2'].lt(df['condition1']) & df['condition2'].gt(df['condition3'])
df['condition3_bool'] = df['condition3'].gt(.9)
df = df[['condition1_bool','condition2_bool','condition3_bool']]
df = df.astype(int)
On my screen this prints:
condition1_bool condition2_bool condition3_bool
2019-01-01 00:00:00 0 0 0
2019-01-01 00:01:00 0 0 1 <---- Count as same event!
2019-01-01 00:02:00 0 0 1 <---- Count as same event!
2019-01-01 00:03:00 1 0 0
2019-01-01 00:04:00 0 0 0
What I am trying to figure out is how to rollup per hour cumulative events (True or 1
) but if there is no 0 between events, its the same event! Hopefully that makes sense what I was describing above on the <---- Count as same event!
If I do:
df = df.resample('H').sum()
This will just resample and count all events, right regardless of the time series commitment I was trying to highlight with the <---- Count as same event!
Thanks for any tips!!
CodePudding user response:
Check if the current row ("2019-01-01 00:02:00") equals to 1 and check if the previous row ("2019-01-01 00:01:00") is not equal to 1. This removes consecutive 1 of the sum.
>>> df.resample('H').apply(lambda x: (x.eq(1) & x.shift().ne(1)).sum())
condition1_bool condition2_bool condition3_bool
2019-01-01 00:00:00 4 8 4
2019-01-01 01:00:00 9 7 6
2019-01-01 02:00:00 7 14 4
2019-01-01 03:00:00 2 8 7
2019-01-01 04:00:00 4 9 5
... ... ... ...
2019-01-06 21:00:00 4 8 2
2019-01-06 22:00:00 3 11 4
2019-01-06 23:00:00 6 11 4
2019-01-07 00:00:00 8 7 8
2019-01-07 01:00:00 4 9 6
[146 rows x 3 columns]
Using your code:
>>> df.resample('H').sum()
condition1_bool condition2_bool condition3_bool
2019-01-01 00:00:00 5 8 5
2019-01-01 01:00:00 9 8 6
2019-01-01 02:00:00 7 14 5
2019-01-01 03:00:00 2 9 7
2019-01-01 04:00:00 4 11 5
... ... ... ...
2019-01-06 21:00:00 5 11 3
2019-01-06 22:00:00 3 15 4
2019-01-06 23:00:00 6 12 4
2019-01-07 00:00:00 8 7 10
2019-01-07 01:00:00 4 9 7
[146 rows x 3 columns]
Check:
dti = pd.date_range('2021-11-15 21:00:00', '2021-11-15 22:00:00',
closed='left', freq='T')
df1 = pd.DataFrame({'c1': 1}, index=dti)
>>> df1.resample('H').apply(lambda x: (x.eq(1) & x.shift().ne(1)).sum())
c1
2021-11-15 21:00:00 1
>>> df1.resample('H').sum()
c1
2021-11-15 21:00:00 60