Home > Mobile >  Selecting data in rows based on criteria in column
Selecting data in rows based on criteria in column

Time:09-26

I have xls file with two columns which i would like to read in pandas. Then from dataframe created I would like to output another xls file based on criteria in the column result.

If data in result column equals fail and the output xls file should contain data from 1 row above and 1 row below and data from the fail row itself.

example of input and output shown

input:

Team    result
1   pass
2   pass
3   fail
4   pass
5   pass
6   pass
7   fail
8   pass
9   pass
10  pass
11  pass
12  pass
13  pass
14  fail
15  pass

output:

Team    result
2   pass
3   fail
4   pass
6   pass
7   fail
8   pass
13  pass
14  fail
15  pass

CodePudding user response:

shift is your friend here. You will have to shift the result column up and down, and keep the row if one of the values is fail:

x = (df['result'] == 'fail') | (df['result'].shift() == 'fail') | (
     df['result'].shift(-1) == 'fail')
print(df[x])

gives as expected:

    Team result
1      2   pass
2      3   fail
3      4   pass
5      6   pass
6      7   fail
7      8   pass
12    13   pass
13    14   fail
14    15   pass

CodePudding user response:

Use a centered rolling sum on the boolean Series deriving from equality to your target ('fail'), and perform boolean indexing:

df[df['result'].eq('fail').rolling(3, center=True, min_periods=1).max().eq(1)]

NB. this makes it easy to select more rows below/after. For example to get 2 above 2 below use 5 (2*n 1 for the generic case) as a rolling window.

output:

    Team result
1      2   pass
2      3   fail
3      4   pass
5      6   pass
6      7   fail
7      8   pass
12    13   pass
13    14   fail
14    15   pass

CodePudding user response:

You can add and subtract 1 one from the indexes where df['result'] == 'fail' but this can also result in KeyError if there is a fail on the last row. So we also need to use index.intersection to reach valid indexes:

all_indexes = np.ravel([df[df['result'] == 'fail'].index   i for i in range(-1, 2)])
valid_indexes = df.index.intersection(all_indexes)
df.loc[valid_indexes]

Output:

    result
1   pass
2   fail
3   pass
5   pass
6   fail
7   pass
12  pass
13  fail
14  pass
  • Related