Pandas - drop n rows by column value-CodePudding

I need to remove last n rows where Status equals 1

v = df[df['Status'] == 1].count()
f = df[df['Status'] == 0].count()
diff = v - f
diff

df2 = df[~df['Status'] == 1].tail(diff).all() #ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
    df2

CodePudding user response：

Using groupBy() and transform() to mark columns to keep:

df = pd.DataFrame({"Status": [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1]})
n = 3

df["Keep"] = df.groupby("Status")["Status"].transform(
    lambda x: x.reset_index().index < len(x) - n if x.name == 1 else True
)
df.loc[df["Keep"]].drop(columns="Keep")

CodePudding user response：

Check whether Status is equal to 1 and get only those places where it is (.loc[lambda s: s] is doing that using boolean indexing). The index of n such rows from tail will be dropped:

df.drop(df.Status.eq(1).loc[lambda s: s].tail(n).index)

sample run:

In [343]: df
Out[343]:
   Status
0       1
1       2
2       3
3       2
4       1
5       1
6       1
7       2

In [344]: n
Out[344]: 2

In [345]: df.Status.eq(1)
Out[345]:
0     True
1    False
2    False
3    False
4     True
5     True
6     True
7    False
Name: Status, dtype: bool

In [346]: df.Status.eq(1).loc[lambda s: s]
Out[346]:
0    True
4    True
5    True
6    True
Name: Status, dtype: bool

In [347]: df.Status.eq(1).loc[lambda s: s].tail(n)
Out[347]:
5    True
6    True
Name: Status, dtype: bool

In [348]: df.Status.eq(1).loc[lambda s: s].tail(n).index
Out[348]: Int64Index([5, 6], dtype='int64')

In [349]: df.drop(df.Status.eq(1).loc[lambda s: s].tail(n).index)
Out[349]:
   Status
0       1
1       2
2       3
3       2
4       1
7       2