I have the df which has index with dates and values 0 or 1. I need to filter every first 1 from this data frame in several time periods For example:
2019-11-27 0
2019-11-29 0
2019-12-02 0
2019-12-03 1
2019-12-04 1
2019-12-05 1
2020-06-01 0
2020-06-02 0
2020-06-03 1
2020-06-04 1
2020-06-05 1
From the beginning to 2020-06-03 I want to get:
2019-12-03 1
And then in the new period continuing after the previous one I need the first value with 1:
2020-06-04 1
I also have the column with the values 0 or 1 where 1 means the beginning of the period
CodePudding user response:
you could try something that counts up the amount of 1's and restarts when it hits a 0 and saves the index of that next 1
CodePudding user response:
If I understand you correctly, you want to get the first "1" of each group of ones.
See with a dataframe like :
dates val
0 2019-11-29 0
1 2019-12-02 0
2 2019-12-03 1
3 2019-12-04 1
4 2019-12-05 1
5 2020-06-01 0
6 2020-06-02 0
7 2020-06-03 1
8 2020-06-04 1
9 2020-06-05 1
mask = df["val"].ne(df["val"].shift(1)) & df["val"].eq(1)
df.loc[mask]
Gets you:
dates val
2 2019-12-03 1
7 2020-06-03 1
We get this by shifting the dataframe a step forward and checking for differences. This means that any row where the current row is different from the current row will be matched : the cases where 1 is the value and 0 is the previous value or where 0 is the value and 1 is the previous value.
Then, we add the other condition that the current row must be equal to 1, which gets each row where a 1 first appears when preceded by a 0.