Home > Software design >  how to count occurrences of specific string in previous x rows
how to count occurrences of specific string in previous x rows

Time:01-02

I have a list of activities and the approximate timestamp they occur in. I would like to count the occurences of a string in the previous 'x' rows (walking or running etc.) and add it to the dataframe. Pandas DataFrame does not support rolling (for non-numeric data) and I'm not sure if I can use shift to check like the previous 30, 50 or even 70 rows of data. I haven't made any concrete progress yet as I have been looking for similar questions/solutions on the site.

    timestamp   event
0   2021-12-18 18:20:25 08:00   running
1   2021-12-18 18:20:27 08:00   running
2   2021-12-18 18:20:29 08:00   walking
3   2021-12-18 18:20:31 08:00   walking
4   2021-12-18 18:20:33 08:00   walking
5   2021-12-18 18:20:35 08:00   walking
6   2021-12-18 18:20:37 08:00   walking
7   2021-12-18 18:20:39 08:00   walking
8   2021-12-18 18:20:41 08:00   stationary
9   2021-12-18 18:20:43 08:00   stationary
10  2021-12-18 18:20:45 08:00   stationary
11  2021-12-18 18:20:47 08:00   stationary

df.loc[:, 'Count previous K'] = 0  # new column to count previous row activities

expected output:

    timestamp   event   Count previous K
0   2021-12-18 18:20:25 08:00   running 0
1   2021-12-18 18:20:27 08:00   running 0
2   2021-12-18 18:20:29 08:00   walking 1
3   2021-12-18 18:20:31 08:00   walking 2
4   2021-12-18 18:20:33 08:00   walking 3
5   2021-12-18 18:20:35 08:00   walking 4
6   2021-12-18 18:20:37 08:00   walking 5
7   2021-12-18 18:20:39 08:00   walking 6
8   2021-12-18 18:20:41 08:00   stationary  6
9   2021-12-18 18:20:43 08:00   stationary  6
10  2021-12-18 18:20:45 08:00   stationary  6
11  2021-12-18 18:20:47 08:00   stationary  6
12  2021-12-18 18:20:49 08:00   stationary  5

for a window size of 10 (including current index/row) counting occurences of walking.

CodePudding user response:

I think with your small sample DataFrame of size 12, you'll get mostly NaN if you use a rolling window of 10. However you can use a boolean to see when a particular event is occurring, then perform a rolling sum on the boolean series:

df['walking_count'] = (df['event'] == 'walking').rolling(5, min_periods=1).sum()

This sets a new column 'walking_count' to the following series:

0     0.0
1     0.0
2     1.0
3     2.0
4     3.0
5     4.0
6     5.0
7     5.0
8     4.0
9     3.0
10    2.0
11    1.0
  • Related