Looking for the Pandas syntax for grouping rows between column boolean values.
DataTable
distance | time | new |
---|---|---|
0 | 2020-10-11 1:30:15 | true |
5 | 2020-10-11 1:31:15 | false |
10 | 2020-10-11 1:32:15 | false |
1 | 2020-10-11 2:30:15 | true |
2 | 2020-10-11 2:31:15 | false |
30 | 2020-10-11 2:32:15 | false |
31 | 2020-10-11 2:33:15 | false |
0 | 2020-10-12 1:31:15 | true |
5 | 2020-10-12 1:32:15 | false |
Question
Looking to group all the rows that are a part of each activity. A new activity is determined by the true value in the new column. Based on the data above there should be 3 groups. Each group needs to start with the first row with True and end with the last False before the next True.
How can this be achieved using groupby
?
Code
Here is the ability to use groupby
and group by the date, but like I said above, I'm looking to change this to group by the "new" column group rows into activities. I'm planning to use in a for loop like this though.
for idx, day in df.groupby(df.timestamp.dt.date):
print(idx)
print(day)
CodePudding user response:
Just group by df['new'].cumsum()
:
for idx, day in df.groupby(df['new'].cumsum()):
print('-----')
print(idx)
print(day)
Output:
-----
1
distance time new
0 0 2020-10-11 1:30:15 True
1 5 2020-10-11 1:31:15 False
2 10 2020-10-11 1:32:15 False
-----
2
distance time new
3 1 2020-10-11 2:30:15 True
4 2 2020-10-11 2:31:15 False
5 30 2020-10-11 2:32:15 False
6 31 2020-10-11 2:33:15 False
-----
3
distance time new
7 0 2020-10-12 1:31:15 True
8 5 2020-10-12 1:32:15 False