I have a pandas dataframe with data like:
----------- ----------------- ---------
| JOB-NAME | Status | SLA |
----------- ----------------- ---------
| job_1 | YET_TO_START | --- |
| job_3 | COMPLETED | MET |
| job_4 | RUNNING | MET |
| job_2 | YET_TO_START | LATE |
| job_6 | RUNNING | LATE |
| job_5 | FAILED | LATE |
| job_7 | YET_TO_START | --- |
| job_8 | COMPLETED | NOT_MET |
----------- ----------------- ---------
I need to sort this table based on the Status and SLA states, like for Status: FAILED will be top on the table, then YET_TO_START, then RUNNING, and finally COMPLETED. Similarly for SLA the order will be LATE, ---, NOT_MET, and MET. Like this:
----------- ----------------- ---------
| JOB-NAME | Status | SLA |
----------- ----------------- ---------
| job_5 | FAILED | LATE |
| job_2 | YET_TO_START | LATE |
| job_1 | YET_TO_START | --- |
| job_7 | YET_TO_START | --- |
| job_6 | RUNNING | LATE |
| job_4 | RUNNING | MET |
| job_8 | COMPLETED | NOT_MET |
| job_3 | COMPLETED | MET |
----------- ----------------- ---------
I am able to do this custom sorting priority-based only on single column Status, but unable to do for multiple columns.
sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3}
joined_df = joined_df.sort_values(by=['status'], key=lambda x: x.map(sort_order_dict))
A solution is given here, but its for single column, not multiple column.
CodePudding user response:
You can extend dictionary by values from another columns, only necessary different keys in both columns for correct working like mentioned mozway in comments:
sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3,
"LATE":4, "---":5, "NOT_MET":6, "MET":7}
df = df.sort_values(by=['Status','SLA'], key=lambda x: x.map(sort_order_dict))
print (df)
JOB-NAME Status SLA
5 job_5 FAILED LATE
3 job_2 YET_TO_START LATE
0 job_1 YET_TO_START ---
6 job_7 YET_TO_START ---
4 job_6 RUNNING LATE
2 job_4 RUNNING MET
7 job_8 COMPLETED NOT_MET
1 job_3 COMPLETED MET
Or use ordered Categorical
:
df['Status'] = pd.Categorical(df['Status'], ordered=True,
categories=['FAILED', 'YET_TO_START', 'RUNNING', 'COMPLETED'])
df['SLA'] = pd.Categorical(df['SLA'], ordered=True,
categories= ['LATE', '---', 'NOT_MET', 'MET'])
df = df.sort_values(by=['Status','SLA'])
print (df)
JOB-NAME Status SLA
5 job_5 FAILED LATE
3 job_2 YET_TO_START LATE
0 job_1 YET_TO_START ---
6 job_7 YET_TO_START ---
4 job_6 RUNNING LATE
2 job_4 RUNNING MET
7 job_8 COMPLETED NOT_MET
1 job_3 COMPLETED MET
CodePudding user response:
Use numpy.lexsort
, you can use any number of parameters easily:
import numpy as np
sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3}
sort_order_dict2 = {'LATE': 0, '---': 1, 'NOT_MET': 2, 'MET': 3}
order = np.lexsort([df['SLA'].map(sort_order_dict2),
df['Status'].map(sort_order_dict),
])
out = df.iloc[order]
Output:
JOB-NAME Status SLA
5 job_5 FAILED LATE
3 job_2 YET_TO_START LATE
0 job_1 YET_TO_START ---
6 job_7 YET_TO_START ---
4 job_6 RUNNING LATE
2 job_4 RUNNING MET
7 job_8 COMPLETED NOT_MET
1 job_3 COMPLETED MET