Selecting data in rows based on criteria in column-CodePudding

I have xls file with two columns which i would like to read in pandas. Then from dataframe created I would like to output another xls file based on criteria in the column result.

If data in result column equals fail and the output xls file should contain data from 1 row above and 1 row below and data from the fail row itself.

example of input and output shown

input:

Team    result
1   pass
2   pass
3   fail
4   pass
5   pass
6   pass
7   fail
8   pass
9   pass
10  pass
11  pass
12  pass
13  pass
14  fail
15  pass

output:

Team    result
2   pass
3   fail
4   pass
6   pass
7   fail
8   pass
13  pass
14  fail
15  pass

CodePudding user response：

shift is your friend here. You will have to shift the result column up and down, and keep the row if one of the values is fail:

x = (df['result'] == 'fail') | (df['result'].shift() == 'fail') | (
     df['result'].shift(-1) == 'fail')
print(df[x])

gives as expected:

    Team result
1      2   pass
2      3   fail
3      4   pass
5      6   pass
6      7   fail
7      8   pass
12    13   pass
13    14   fail
14    15   pass

CodePudding user response：

Use a centered rolling sum on the boolean Series deriving from equality to your target ('fail'), and perform boolean indexing:

df[df['result'].eq('fail').rolling(3, center=True, min_periods=1).max().eq(1)]

NB. this makes it easy to select more rows below/after. For example to get 2 above 2 below use 5 (2*n 1 for the generic case) as a rolling window.

output:

    Team result
1      2   pass
2      3   fail
3      4   pass
5      6   pass
6      7   fail
7      8   pass
12    13   pass
13    14   fail
14    15   pass

CodePudding user response：

You can add and subtract 1 one from the indexes where df['result'] == 'fail' but this can also result in KeyError if there is a fail on the last row. So we also need to use index.intersection to reach valid indexes:

all_indexes = np.ravel([df[df['result'] == 'fail'].index   i for i in range(-1, 2)])
valid_indexes = df.index.intersection(all_indexes)
df.loc[valid_indexes]

Output:

    result
1   pass
2   fail
3   pass
5   pass
6   fail
7   pass
12  pass
13  fail
14  pass