I have xls file with two columns which i would like to read in pandas. Then from dataframe created I would like to output another xls file based on criteria in the column result.
If data in result column equals fail and the output xls file should contain data from 1 row above and 1 row below and data from the fail row itself.
example of input and output shown
input:
Team result
1 pass
2 pass
3 fail
4 pass
5 pass
6 pass
7 fail
8 pass
9 pass
10 pass
11 pass
12 pass
13 pass
14 fail
15 pass
output:
Team result
2 pass
3 fail
4 pass
6 pass
7 fail
8 pass
13 pass
14 fail
15 pass
CodePudding user response:
shift
is your friend here. You will have to shift the result
column up and down, and keep the row if one of the values is fail
:
x = (df['result'] == 'fail') | (df['result'].shift() == 'fail') | (
df['result'].shift(-1) == 'fail')
print(df[x])
gives as expected:
Team result
1 2 pass
2 3 fail
3 4 pass
5 6 pass
6 7 fail
7 8 pass
12 13 pass
13 14 fail
14 15 pass
CodePudding user response:
Use a centered rolling
sum on the boolean Series deriving from equality to your target ('fail'), and perform boolean indexing:
df[df['result'].eq('fail').rolling(3, center=True, min_periods=1).max().eq(1)]
NB. this makes it easy to select more rows below/after. For example to get 2 above 2 below use 5
(2*n 1 for the generic case) as a rolling window.
output:
Team result
1 2 pass
2 3 fail
3 4 pass
5 6 pass
6 7 fail
7 8 pass
12 13 pass
13 14 fail
14 15 pass
CodePudding user response:
You can add and subtract 1 one from the indexes where df['result'] == 'fail'
but this can also result in KeyError
if there is a fail
on the last row. So we also need to use index.intersection
to reach valid indexes:
all_indexes = np.ravel([df[df['result'] == 'fail'].index i for i in range(-1, 2)])
valid_indexes = df.index.intersection(all_indexes)
df.loc[valid_indexes]
Output:
result
1 pass
2 fail
3 pass
5 pass
6 fail
7 pass
12 pass
13 fail
14 pass