I need to remove last n rows where Status equals 1
v = df[df['Status'] == 1].count()
f = df[df['Status'] == 0].count()
diff = v - f
diff
df2 = df[~df['Status'] == 1].tail(diff).all() #ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
df2
CodePudding user response:
Using groupBy()
and transform()
to mark columns to keep:
df = pd.DataFrame({"Status": [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1]})
n = 3
df["Keep"] = df.groupby("Status")["Status"].transform(
lambda x: x.reset_index().index < len(x) - n if x.name == 1 else True
)
df.loc[df["Keep"]].drop(columns="Keep")
CodePudding user response:
Check whether Status
is eq
ual to 1
and get only those places where it is (.loc[lambda s: s]
is doing that using boolean indexing). The index
of n
such rows from tail
will be drop
ped:
df.drop(df.Status.eq(1).loc[lambda s: s].tail(n).index)
sample run:
In [343]: df
Out[343]:
Status
0 1
1 2
2 3
3 2
4 1
5 1
6 1
7 2
In [344]: n
Out[344]: 2
In [345]: df.Status.eq(1)
Out[345]:
0 True
1 False
2 False
3 False
4 True
5 True
6 True
7 False
Name: Status, dtype: bool
In [346]: df.Status.eq(1).loc[lambda s: s]
Out[346]:
0 True
4 True
5 True
6 True
Name: Status, dtype: bool
In [347]: df.Status.eq(1).loc[lambda s: s].tail(n)
Out[347]:
5 True
6 True
Name: Status, dtype: bool
In [348]: df.Status.eq(1).loc[lambda s: s].tail(n).index
Out[348]: Int64Index([5, 6], dtype='int64')
In [349]: df.drop(df.Status.eq(1).loc[lambda s: s].tail(n).index)
Out[349]:
Status
0 1
1 2
2 3
3 2
4 1
7 2