deleting specific rows of a dataframe based on a certain condition-CodePudding

I am looking for a code that would get rid of rows for instances where the REVENUE_STATUS_FLAG is 0 before it becomes 1. HOWEVER, I would not like to get rid of the 0's after where 1 has occurred for that SCU_KEY (which is all based on sorted date). So with the image shown, the values I would like to get rid of these rows, which I categorized based on the index:

1166, 221 - (first two off SCU_KEY 3)

333, 1645, 1614 - (from SCU_KEY 14)

You will notice that I don't want to get rid of 334 from SCU_KEY 3 and that is because of the 1 occurring right before it from 45. This df is bigger than is shown so manually inputting the specific numbers mentioned does not suffice.

CodePudding user response：

Since you only care about the 0's before the 1's, you can use the cumulative sum function over the SCU_KEY to find the rows where it is the first non-zero or after the first non-zero. Cumulative sum becomes how many 1's were seen in this row or before this row. As long as that is 1 or more, you keep the row.

df = <your data frame>

df[df.groupby('SCU_KEY').cumsum().values >= 1]`