How can I select rows in an SQL table by using multiple substrings of a single column value?-CodePudding

I am using Oracle SQL Developer to query a database of cases, and I need to use three different identifiers to select the correct row for each case; however, my problem is that two of the identifiers are contained in the same text string, and I can’t figure out how to parse them to use in the query.

In the following table, the identifiers I need to use are:

ID-1, characters 3–6 in the "Case" column (e.g., "1001"),
ID-2, characters 8–9 in the "Case" column (e.g., "01") when they appear (treated as "00" if they don't appear), and
ID-3, the value in the "Sequence" column (e.g., "672").

Source Table

Case	Sequence	Value 1	Value 2
AA1001	672	73	195
AA1001	711	73	185
AA1001-01	680	73	185
AA1001-02	685	72	185
AA1001-02	699	72	182
AB1002	676	51	36
AB1002-01	701	48	39
AB1002-01	719	48	35
AB1002-02	707	51	38
AA1003	655	122	416
AA1003	683	113	416

I want to return one row for each unique ID-1, such that first the greatest value for ID-2 is selected, and then the greatest value for ID-3 in that subset is chosen; so, the query should return only the following three rows from the table above.

Result Table

Case	Sequence	Value 1	Value 2
AA1001-02	699	72	182
AB1002-02	707	51	38
AA1003	683	113	416

I've tried taking the maximum for ID-2 using the following, but it only returns the rows where ID-2 equals "02".

SELECT * 
FROM table 
WHERE SUBSTR(Case,3,4) in ('1001','1002','1003') 
and SUBSTR(Case,8,2) = (SELECT MAX(SUBSTR(Case,8,2)) 
FROM table 
WHERE SUBSTR(Case,3,4) in ('1001','1002','1003'))

(The easiest answer is probably just to add a column for ID-2; however, the source database is strictly read-only, so I can't make that sort of change.)

CodePudding user response：

In Oracle 12.1 and higher, you can use the match_recognize clause:

select case, sequence, value1, value2
from   (
         select t.*, substr(case, 3, 4) as id_1, substr(case, 8, 9) as id_2
         from   table_name t
       )
match_recognize(
  partition by id_1
  order     by id_2 desc nulls last, sequence desc nulls last
  all rows per match
  pattern   (^ x)
  define    x as null is null
);


CASE        SEQUENCE     VALUE1     VALUE2
--------- ---------- ---------- ----------
AA1001-02        699         72        182
AB1002-02        707         51         38
AA1003           683        113        416

CodePudding user response：

You can use the ROW_NUMBER() analytic function and find the SUBSTRings to partition/order by:

SELECT *
FROM   (
  SELECT t.*,
         ROW_NUMBER() OVER (
           PARTITION BY
             SUBSTR(Case, 3, 4)
           ORDER BY
             COALESCE(SUBSTR(Case, 8, 2), '00') DESC,
             Sequence DESC
         ) AS rn
  FROM   table_name t
)
WHERE rn = 1;

Which, for your sample data:

CREATE TABLE table_name (Case, Sequence, Value1, Value2 ) AS
SELECT 'AA1001',    672,  73, 195 FROM DUAL UNION ALL
SELECT 'AA1001',    711,  73, 185 FROM DUAL UNION ALL
SELECT 'AA1001-01', 680,  73, 185 FROM DUAL UNION ALL
SELECT 'AA1001-02', 685,  72, 185 FROM DUAL UNION ALL
SELECT 'AA1001-02', 699,  72, 182 FROM DUAL UNION ALL
SELECT 'AB1002',    676,  51,  36 FROM DUAL UNION ALL
SELECT 'AB1002-01', 701,  48,  39 FROM DUAL UNION ALL
SELECT 'AB1002-01', 719,  48,  35 FROM DUAL UNION ALL
SELECT 'AB1002-02', 707,  51,  38 FROM DUAL UNION ALL
SELECT 'AA1003',    655, 122, 416 FROM DUAL UNION ALL
SELECT 'AA1003',    683, 113, 416 FROM DUAL;

Outputs:

CASE SEQUENCE VALUE1 VALUE2 RN

AA1001-02 699 72 182 1

AB1002-02 707 51 38 1

AA1003 683 113 416 1

CASE	SEQUENCE	VALUE1	VALUE2	RN
AA1001-02	699	72	182	1
AB1002-02	707	51	38	1
AA1003	683	113	416	1

db<>fiddle here