Pandas: How to custom-sort on multiple columns?-CodePudding

I have a pandas dataframe with data like:

 ----------- ----------------- --------- 
| JOB-NAME  |  Status         | SLA     |
 ----------- ----------------- --------- 
| job_1     | YET_TO_START    | ---     |
| job_3     | COMPLETED       | MET     |
| job_4     | RUNNING         | MET     |
| job_2     | YET_TO_START    | LATE    |
| job_6     | RUNNING         | LATE    |
| job_5     | FAILED          | LATE    |
| job_7     | YET_TO_START    | ---     |
| job_8     | COMPLETED       | NOT_MET |
 ----------- ----------------- ---------

I need to sort this table based on the Status and SLA states, like for Status: FAILED will be top on the table, then YET_TO_START, then RUNNING, and finally COMPLETED. Similarly for SLA the order will be LATE, ---, NOT_MET, and MET. Like this:

 ----------- ----------------- --------- 
| JOB-NAME  |  Status         | SLA     |
 ----------- ----------------- --------- 
| job_5     | FAILED          | LATE    |
| job_2     | YET_TO_START    | LATE    |
| job_1     | YET_TO_START    | ---     |
| job_7     | YET_TO_START    | ---     |
| job_6     | RUNNING         | LATE    |
| job_4     | RUNNING         | MET     |
| job_8     | COMPLETED       | NOT_MET |
| job_3     | COMPLETED       | MET     |
 ----------- ----------------- ---------

I am able to do this custom sorting priority-based only on single column Status, but unable to do for multiple columns.

sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3}
joined_df = joined_df.sort_values(by=['status'], key=lambda x: x.map(sort_order_dict))

A solution is given here, but its for single column, not multiple column.

CodePudding user response：

You can extend dictionary by values from another columns, only necessary different keys in both columns for correct working like mentioned mozway in comments:

sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3,
                   "LATE":4, "---":5, "NOT_MET":6, "MET":7}
df = df.sort_values(by=['Status','SLA'], key=lambda x: x.map(sort_order_dict))
print (df)
  JOB-NAME        Status      SLA
5    job_5        FAILED     LATE
3    job_2  YET_TO_START     LATE
0    job_1  YET_TO_START      ---
6    job_7  YET_TO_START      ---
4    job_6       RUNNING     LATE
2    job_4       RUNNING      MET
7    job_8     COMPLETED  NOT_MET
1    job_3     COMPLETED      MET

Or use ordered Categorical:

df['Status'] = pd.Categorical(df['Status'], ordered=True, 
                              categories=['FAILED', 'YET_TO_START', 'RUNNING', 'COMPLETED'])
                                          
df['SLA'] = pd.Categorical(df['SLA'], ordered=True, 
                              categories= ['LATE', '---', 'NOT_MET', 'MET'])
df = df.sort_values(by=['Status','SLA'])
print (df)
  JOB-NAME        Status      SLA
5    job_5        FAILED     LATE
3    job_2  YET_TO_START     LATE
0    job_1  YET_TO_START      ---
6    job_7  YET_TO_START      ---
4    job_6       RUNNING     LATE
2    job_4       RUNNING      MET
7    job_8     COMPLETED  NOT_MET
1    job_3     COMPLETED      MET

CodePudding user response：

Use numpy.lexsort, you can use any number of parameters easily:

import numpy as np

sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3}
sort_order_dict2 = {'LATE': 0, '---': 1, 'NOT_MET': 2, 'MET': 3}

order = np.lexsort([df['SLA'].map(sort_order_dict2),
                    df['Status'].map(sort_order_dict),
                    ])

out = df.iloc[order]

Output:

  JOB-NAME        Status      SLA
5    job_5        FAILED     LATE
3    job_2  YET_TO_START     LATE
0    job_1  YET_TO_START      ---
6    job_7  YET_TO_START      ---
4    job_6       RUNNING     LATE
2    job_4       RUNNING      MET
7    job_8     COMPLETED  NOT_MET
1    job_3     COMPLETED      MET