how to find a way to find all values of column 1 which don't have any specific entries in colum-CodePudding

I have data in which we have columns qid,docid, good, fair, bad Good, fair and bad are grades from different graders and I need to find qid values for which there exists only values that are bad.

qid  docid good fair bad
2     3     2    3   0 
2     5     1    0   4
3     2     0    0   4
3     4     0    0   3

In this case I want to find qid = 3 as the only row that has this qid has all rows with only "bad" results. I am unable to find an efficient way to do this operation

CodePudding user response：

Use a boolean mask:

>>> df[df[['good', 'fair']].eq(0).all(1) & df['bad'].gt(0)]
   qid  docid  good  fair  bad
2    3      2     0     0    4
3    3      4     0     0    3


>>> df.loc[df[['good', 'fair']].eq(0).all(1) & df['bad'].gt(0), 'qid'] \
      .unique().tolist()
[3]

CodePudding user response：

You can use an np.where() as below:

df['Check'] = np.where((df.qid == 3) & (df.bad > 0), True, False)
df[['qid', 'docid', 'good', 'fair', 'bad']].loc[df['Check'] == True]

CodePudding user response：

You can use the df.groupby function to get a table with the sum of different grades occurences:

>> filtered_df = df.groupby(by=['qid']).sum().reset_index()

   qid  docid  good  fair  bad
0    2      8     3     3    4
1    3      6     0     0    7

Then you can use df.loc to find the rows with only bad grades:

>> filtered_df.loc[(filtered_df["good"] == 0) && (filtered_df["fair"] == 0)]

   qid  docid  good  fair  bad
1    3      6     0     0    7