I want to do a slice a dataframe in pandas like this:
index a b
0 A -
1 A
2 A -
3 B -
4 C
5 C -
I want to keep all the rows after the first ' ', grouped by column A, and delete all the rows in each group starting with '-'. The outcome should be like this:
index a b
1 A
2 A -
4 C
5 C -
How to do this?
CodePudding user response:
Use GroupBy.cummax
with compare b
for
for keep all rows after first
per groups:
df1 = (df[df.assign(new = lambda x: x['b'].eq(' '))
.groupby('a')['new']
.cummax()])
print (df1)
a b
1 A
2 A -
4 C
5 C -
CodePudding user response:
Simple syntax, use groupby
on Series with cummax
:
df[df['b'].eq(' ').groupby(df['a']).cummax()]
output:
index a b
1 1 A
2 2 A -
4 4 C
5 5 C -
If you also want to delete groups that start with -
("delete all the rows in each group starting with '-'"), you can combine cummin
/cummax
:
df[df['b'].ne('-').groupby(df['a']).apply(lambda s: s.cummin().cummax())]
output:
index a b
4 4 C
5 5 C -