I have a data frame. I check if a row is good or bad. If a row is bad, I want to drop this and the previous two rows (n=2). In my actual problem n=60. I have a working solution. Is there a better way? checking if my solution is the pythonic way of doing it. My code:
df = pd.DataFrame({'A':[10,20,30,40,50],'isBad?':[False,False,True,False,False]})
df =
A isBad?
0 10 False
1 20 False
2 30 True
3 40 False
4 50 False
Expected answer:
df =
A isBad?
1 40 False
2 50 False
My solution:
bad_row_index = pd.concat(df.loc[i-2:i:1] for i,r in xdf.iterrows() if r['isBad?']==True).drop_duplicates(keep='first').index
df[~df.index.isin(bad_row_index)].reset_index(drop=True,inplace=True)
df =
A isBad?
1 40 False
2 50 False
CodePudding user response:
Interesting question!
After a bit of exploration, I came up with a pretty short solution:
subset = df[~(df['isBad?'] | df['isBad?'].shift(-1) | df['isBad?'].shift(-2))]
Output:
>>> subset
A isBad?
3 40 False
4 50 False
A dynamic version of that (so that you can change the number of previous rows dropped without manually writing more .shift()
s):
import functools as ft
n = 2 # Drop all True's and the 2 previous ones
subset = df[~ft.reduce(lambda x,y: x|y, [df['isBad?'].shift(-i) for i in range(n 1)])]
Output:
>>> subset
A isBad?
3 40 False
4 50 False
CodePudding user response:
A fairly straightforward approach would be to check if the current row is bad (True
), or the next row is bad, or two rows ahead is bad by shifting. Then filter out where that is not (~
) the case:
c1 = df['isBad?'].shift(-2, fill_value=False) # Two rows ahead
c2 = df['isBad?'].shift(-1, fill_value=False) # One row ahead
m = ~(c1 | c2 | df['isBad?']) # NOT (c1 OR c2 OR isBad?)
df = df[m] # Filter rows
df
:
A isBad?
3 40 False
4 50 False