really struggling with this pandas task for a research project.
I have a dataframe df
that has two columns: time
(a datetime column) and result
(a boolean column). I want to count the streak of TRUE rows immediately preceding the current row with a look-back window of 7 days.
For example:
If the preceding row is false, then the count is 0
If the preceding row is true, then I want to know what the streak of true rows was within the 7 day period preceding that row.
Example of expected output below.
time | result | DESIRED OUTPUT |
---|---|---|
5/1/21 | TRUE | 0 (no preceding rows) |
5/6/21 | TRUE | 1 |
5/8/21 | FALSE | 2 (immediately preceded by streak of 2 TRUE rows in past 7 days) |
5/10/21 | FALSE | 0 |
5/11/21 | TRUE | 0 |
5/14/21 | TRUE | 1 (preceding row is TRUE) |
5/20/21 | TRUE | 1 (immediately preceded by streak of one TRUE rows in 1 week window) |
5/21/21 | TRUE | 2 (immediately preceded by streak of two TRUE rows in 1 week window) |
5/22/21 | TRUE | 2 (immediately preceded by streak of two TRUE rows in 1 week window) |
5/23/21 | FALSE | 3 (immediately preceded by streak of three TRUE rows in 1 week window) |
5/24/21 | TRUE | 0 (preceded by FALSE row) |
5/26/21 | TRUE | 1 (immediately preceded by streak of 1 TRUE row) |
I have been scouring Stack Overflow and racking my brain for days but just can't figure out a way to do this well. The trick with shift and groupby, e.g. df * (df.groupby((df != df.shift()).cumsum()).cumcount())
would work perfectly except it doesn't take into account the 7-day lookback window and the data is sampled irregularly, so I can't make assumptions on how many rows appear there will be in a 7-day period.
Thank you all so much for your time and help!
CodePudding user response:
I believe you were on the right track. This
Please notice I converted your date column into a datetime index, which I think is required for this to work. You could always make a small, temporary dataframe to do this, if you don't want to convert the whole thing.
Using sum()
works because summing a boolean column gives you the number of True values in it.