I have a list of activities and the approximate timestamp they occur in. I would like to count the occurences of a string in the previous 'x' rows (walking or running etc.) and add it to the dataframe. Pandas DataFrame does not support rolling (for non-numeric data) and I'm not sure if I can use shift to check like the previous 30, 50 or even 70 rows of data. I haven't made any concrete progress yet as I have been looking for similar questions/solutions on the site.
timestamp event
0 2021-12-18 18:20:25 08:00 running
1 2021-12-18 18:20:27 08:00 running
2 2021-12-18 18:20:29 08:00 walking
3 2021-12-18 18:20:31 08:00 walking
4 2021-12-18 18:20:33 08:00 walking
5 2021-12-18 18:20:35 08:00 walking
6 2021-12-18 18:20:37 08:00 walking
7 2021-12-18 18:20:39 08:00 walking
8 2021-12-18 18:20:41 08:00 stationary
9 2021-12-18 18:20:43 08:00 stationary
10 2021-12-18 18:20:45 08:00 stationary
11 2021-12-18 18:20:47 08:00 stationary
df.loc[:, 'Count previous K'] = 0 # new column to count previous row activities
expected output:
timestamp event Count previous K
0 2021-12-18 18:20:25 08:00 running 0
1 2021-12-18 18:20:27 08:00 running 0
2 2021-12-18 18:20:29 08:00 walking 1
3 2021-12-18 18:20:31 08:00 walking 2
4 2021-12-18 18:20:33 08:00 walking 3
5 2021-12-18 18:20:35 08:00 walking 4
6 2021-12-18 18:20:37 08:00 walking 5
7 2021-12-18 18:20:39 08:00 walking 6
8 2021-12-18 18:20:41 08:00 stationary 6
9 2021-12-18 18:20:43 08:00 stationary 6
10 2021-12-18 18:20:45 08:00 stationary 6
11 2021-12-18 18:20:47 08:00 stationary 6
12 2021-12-18 18:20:49 08:00 stationary 5
for a window size of 10 (including current index/row) counting occurences of walking.
CodePudding user response:
I think with your small sample DataFrame of size 12, you'll get mostly NaN if you use a rolling window of 10. However you can use a boolean to see when a particular event is occurring, then perform a rolling sum on the boolean series:
df['walking_count'] = (df['event'] == 'walking').rolling(5, min_periods=1).sum()
This sets a new column 'walking_count'
to the following series:
0 0.0
1 0.0
2 1.0
3 2.0
4 3.0
5 4.0
6 5.0
7 5.0
8 4.0
9 3.0
10 2.0
11 1.0