How is it possible to select only some rows, based on a given condition, in a Pandas rolling window count ? I have not find any solution in the documentation, or in other questions.
With the following dataframe:
random.seed(42)
date_0 = datetime.datetime(2020, 1, 1, 0, 0, 0, 0)
dates = [date_0 datetime.timedelta(seconds=random.uniform(0, 120)) for i in range(500)]
dates.sort()
speeds = [random.uniform(1, 10) for i in range(500)]
speeds.sort()
pressures = [i**2 random.normalvariate(0, 1) for i in speeds]
data = [speeds, pressures]
df = pd.DataFrame(data=list(zip(speeds, pressures)), columns=['speed', 'pressure'], index=dates)
dates_1 = random.sample(dates, int(len(dates) * 0.6))
df.loc[:, 'controlled'] = False
df.loc[dates_1, 'controlled'] = True
df.loc[:, 'rolling_obs_count'] = df.loc[:, 'speed'].rolling(window=str(1) 's').count()
print('df: \n', df)
df:
speed pressure controlled rolling_obs_count
2020-01-01 00:00:00.048713 1.024082 4.491483 True 1.0
2020-01-01 00:00:00.068628 1.084577 0.773474 True 2.0
2020-01-01 00:00:00.202953 1.091360 0.584872 True 3.0
2020-01-01 00:00:00.425483 1.125268 2.361184 False 4.0
2020-01-01 00:00:00.665378 1.127335 2.050226 True 5.0
... ... ... ... ...
2020-01-01 00:01:59.531574 9.945263 98.811644 True 5.0
2020-01-01 00:01:59.534566 9.976833 99.481287 False 6.0
2020-01-01 00:01:59.704513 9.990121 99.452698 False 6.0
2020-01-01 00:01:59.814533 9.996152 99.479074 True 6.0
2020-01-01 00:01:59.913896 9.999170 100.584748 True 7.0
the count function counts all rows within the rolling window, and I need to count only the rows where "controlled" column is "True". How can I do that ?
CodePudding user response:
Perhaps simply:
>>> df.rolling('1s')['controlled'].sum()
2020-01-01 00:00:00.048713 1.0
2020-01-01 00:00:00.068628 2.0
2020-01-01 00:00:00.202953 3.0
2020-01-01 00:00:00.425483 3.0
2020-01-01 00:00:00.665378 4.0
...
2020-01-01 00:01:59.531574 5.0
2020-01-01 00:01:59.534566 5.0
2020-01-01 00:01:59.704513 4.0
2020-01-01 00:01:59.814533 4.0
2020-01-01 00:01:59.913896 5.0