Home > Software design >  Sort by given "rank" of column values
Sort by given "rank" of column values

Time:01-19

I have a table like this (unsorted):

risk category
Low A
Medium B
High C
Medium A
Low B
High A
Low C
Low E
Low D
High B

I need to sort rows by category, but first based on the value of risk. The desired result should look like this (sorted):

risk category
Low A
Low B
Low C
Low D
Low E
Medium A
Medium B
High A
High B
High C

I've come up with below query but wonder if it is correct:

SELECT
    *
FROM
    some_table
ORDER BY
    CASE
    WHEN risk = 'Low'    THEN
    0
    WHEN risk = 'Medium' THEN
    1
    WHEN risk = 'High'   THEN
    2
    ELSE
    3
    END,
    category;

Just want to understand whether the query is correct or not. The actual data set is huge and there are many other values for risk and categories and hence I can't figure out if the results are correct or not. I've just simplified it here.

CodePudding user response:

Basically correct, but you can simplify:

SELECT *
FROM   some_table
ORDER  BY CASE risk
            WHEN 'Low'    THEN 0
            WHEN 'Medium' THEN 1
            WHEN 'High'   THEN 2
            -- rest defaults to NULL and sorts last
          END
        , category;

A "switched" CASE is shorter and slightly cheaper. In the absence of an ELSE branch, all remaining cases default to NULL, and NULL sorts last in default ascending sort order. So you don't need to do anything extra.

Many other values

... there are many other values for risk

While all other values are lumped together at the bottom of the sort order, this seems ok.
If all of those many values get their individual ranking, I would suggest an additional table to handle ranks of risk values. Like:

CREATE TABLE riskrank (
  risk text PRIMARY KEY
, riskrank real
);

INSERT INTO riskrank VALUES
  ('Low'   , 0)
, ('Medium', 1)
, ('High'  , 2)
-- many more?
; 

Data type real, so it's easy to squeeze in rows with fractional digits in different positions (like enum values do it internally).

Then your query is:

SELECT s.*
FROM   some_table s
LEFT   JOIN risk_rank rr USING (risk)
ORDER  BY rr.riskrank, s.category;

LEFT JOIN, so missing entries in riskrank don't eliminate rows.

enum?

I already mentioned the data type enum. That's a possible alternative as enum values are sorted in the order they are defined (not how they are spelled). They only occupy 4 bytes on disk (real internally), are fast and enforce valid values implicitly. See:

However, I would only even consider an enum if the sort order of your values is immutable. Changing sort order and adding / removing allowed values is cumbersome. The manual:

Although enum types are primarily intended for static sets of values, there is support for adding new values to an existing enum type, and for renaming values (see ALTER TYPE). Existing values cannot be removed from an enum type, nor can the sort ordering of such values be changed, short of dropping and re-creating the enum type.

  • Related