I have the following dataset, I need to delete the previous 0 rows if flag is 1.
ID Flag
103200 0
103200 1
103200 0
104752 0
104752 0
104752 1
104752 0
104752 1
104752 0
104752 0
104760 0
104760 1
Here is the result I want:
ID Flag
103200 1
103200 0
104752 1
104752 0
104752 1
104752 0
104752 0
104760 1
CodePudding user response:
Use a groupby.cummax
and boolean indexing:
out = df[df.groupby('ID')['Flag'].cummax().ne(0)]
# or
# out = df[df['Flag'].ne(0).groupby(df['ID']).cummax()]
output:
ID Flag
1 103200 1
2 103200 0
5 104752 1
6 104752 0
7 104752 1
8 104752 0
9 104752 0
11 104760 1
CodePudding user response:
use the dataframe index to get the index where flag==1 the drop row using the index minus 1.
data="""ID Flag
103200 0
103200 1
103200 0
104752 0
104752 0
104752 1
104752 0
104752 1
104752 0
104752 0
104760 0
104760 1"""
mylist=[]
data = data.split("\n")
for item in data:
elements=item.split(' ')
if len(elements) == 1:
elements=item.split(' ')
mylist.append(elements)
df = pd.DataFrame(mylist)
df.columns = df.iloc[0]
df.drop(index=df.index[0],
axis=0,
inplace=True)
df["Flag"] = df["Flag"].astype(int)
df.drop(df[df["Flag"] == 1].index - 1, inplace=True)
print(df)
output
0 ID Flag
2 103200 1
3 103200 0
4 104752 0
6 104752 1
8 104752 1
9 104752 0
10 104752 0
12 104760 1