Pandas: How to check if the column contains value 0 then sort the selected rows data according to so-CodePudding

I have a data frame

Testcase   Processing_time   Pass   Fail   avg_failure_rate   Ranking_value
   t1         1.102088        8    26        76.47           69.38
   t2         1.718864        19    3        13.63           7.93
   t3         25              22    0         0               0
   t4         15              22    0         0               0

I want to keep the first two test cases as it is from the above data frame, but I want to sort the rest of the test cases based on the shortest processing time column.

Desired output:

TestCase   Processing_time  Pass  Fail  avg_failure_rate  Ranking_value
   t1         1.102088        8    26        76.47           69.38
   t2         1.718864        19    3        13.63           7.93
   t4         15              22    0         0               0
   t3         25              22    0         0               0

If the test cases have a ranking value equal to 0, they should be sorted based on the shortest processing time rule. Is there any way to accomplish this?

CodePudding user response：

Do it with two steps, split the dataframe then concat back

idx = df.index[df.Ranking_value==0]
out = pd.concat([df.drop(idx),df[idx].sort_values('Processing_time')])
Out[120]: 
  Testcase  Processing_time  Pass  Fail  avg_failure_rate  Ranking_value
0       t1         1.102088     8    26             76.47          69.38
1       t2         1.718864    19     3             13.63           7.93
3       t4        15.000000    22     0              0.00           0.00
2       t3        25.000000    22     0              0.00           0.00

CodePudding user response：

Filter with rank value 0 with .loc and sort by .sort_values(). Then append back to the other part with rank value not equal 0 by .append(), as follows:

df.loc[df['Ranking_value'] != 0].append(df.loc[df['Ranking_value'] == 0].sort_values('Processing_time'))

Result:

  Testcase  Processing_time  Pass  Fail  avg_failure_rate  Ranking_value
0       t1         1.102088     8    26             76.47          69.38
1       t2         1.718864    19     3             13.63           7.93
3       t4        15.000000    22     0              0.00           0.00
2       t3        25.000000    22     0              0.00           0.00

CodePudding user response：

Try maskning.

mask = df.loc[:,“Ranking_value”] == 0. 

df.loc[mask,:].sort_values(“Processing_time”,inplace=True)