Home > OS >  Pandas - Rolling window count with condition
Pandas - Rolling window count with condition

Time:09-22

How is it possible to select only some rows, based on a given condition, in a Pandas rolling window count ? I have not find any solution in the documentation, or in other questions.

With the following dataframe:

random.seed(42)
date_0 = datetime.datetime(2020, 1, 1, 0, 0, 0, 0)
dates = [date_0   datetime.timedelta(seconds=random.uniform(0, 120)) for i in range(500)]
dates.sort()
speeds = [random.uniform(1, 10) for i in range(500)]
speeds.sort()
pressures = [i**2   random.normalvariate(0, 1) for i in speeds]
data = [speeds, pressures]
df = pd.DataFrame(data=list(zip(speeds, pressures)), columns=['speed', 'pressure'], index=dates)
dates_1 = random.sample(dates, int(len(dates) * 0.6))
df.loc[:, 'controlled'] = False
df.loc[dates_1, 'controlled'] = True
df.loc[:, 'rolling_obs_count'] = df.loc[:, 'speed'].rolling(window=str(1)   's').count()
print('df: \n', df)
df: 
                                speed    pressure  controlled  rolling_obs_count
2020-01-01 00:00:00.048713  1.024082    4.491483        True                1.0
2020-01-01 00:00:00.068628  1.084577    0.773474        True                2.0
2020-01-01 00:00:00.202953  1.091360    0.584872        True                3.0
2020-01-01 00:00:00.425483  1.125268    2.361184       False                4.0
2020-01-01 00:00:00.665378  1.127335    2.050226        True                5.0
...                              ...         ...         ...                ...
2020-01-01 00:01:59.531574  9.945263   98.811644        True                5.0
2020-01-01 00:01:59.534566  9.976833   99.481287       False                6.0
2020-01-01 00:01:59.704513  9.990121   99.452698       False                6.0
2020-01-01 00:01:59.814533  9.996152   99.479074        True                6.0
2020-01-01 00:01:59.913896  9.999170  100.584748        True                7.0

the count function counts all rows within the rolling window, and I need to count only the rows where "controlled" column is "True". How can I do that ?

CodePudding user response:

Perhaps simply:

>>> df.rolling('1s')['controlled'].sum()
2020-01-01 00:00:00.048713    1.0
2020-01-01 00:00:00.068628    2.0
2020-01-01 00:00:00.202953    3.0
2020-01-01 00:00:00.425483    3.0
2020-01-01 00:00:00.665378    4.0
                             ... 
2020-01-01 00:01:59.531574    5.0
2020-01-01 00:01:59.534566    5.0
2020-01-01 00:01:59.704513    4.0
2020-01-01 00:01:59.814533    4.0
2020-01-01 00:01:59.913896    5.0
  • Related