Home > other >  Selecting Different Rows in a Group based on condition
Selecting Different Rows in a Group based on condition

Time:05-03

I have the following problem with selecting rows in a group: I have records from different data sources, each data source has a priority assigned to it. Whenever I have two identical records with different data sources, I want to select the one with the lowest priority. Whenever I have duplicate records from the same data source, but with different entries in the VALUE column, I want to select the row with 't' (if there are several, take the first).

Here's an example dataset:

Example Dataset

And the expected result:

Expected result

What I did until now was to select those entries, that had the lowest prio inside a group. Here's a code example:

WITH Product AS (
SELECT 1 id, 'A' code, 'f' value, 'S1' datasource, 1 prio FROM DUAL 
UNION ALL
SELECT 1 id, 'A' code, 'f' value, 'S2' datasource, 2 prio FROM DUAL 
UNION ALL 
SELECT 3 id, 'A' code, 'f' value, 'S3' datasource, 3 prio FROM DUAL 
UNION ALL 
SELECT 2 id, 'A' code, 'f' value, 'S1' datasource, 1 prio FROM DUAL 
UNION ALL 
SELECT 2 id, 'B' code, 't' value, 'S1' datasource, 1 prio FROM DUAL 
),
extended_product AS
(
SELECT p.*, ROW_NUMBER() OVER(PARTITION BY p.ID ORDER BY p.PRIO ASC) AS ROW_NUMBER
FROM Product p 
)
SELECT ep.*
FROM extended_product ep
WHERE ROW_NUMBER = 1;

I don't know how to do both conditions, selecting lowest prio and in case all entries inside the group have the same datasource, take only those with 't' (and take the first one if there are several).

CodePudding user response:

You're close, I think. row_number misses the value part:

ROW_NUMBER () OVER (PARTITION BY p.ID ORDER BY p.PRIO ASC, VALUE DESC) AS ROW_NUMBER

SQL> WITH
  2     Product
  3     AS
  4        (SELECT 1 id, 'A' code, 'f' VALUE, 'S1' datasource, 1 prio FROM DUAL
  5         UNION ALL
  6         SELECT 1 id, 'B' code, 'f' VALUE, 'S2' datasource, 2 prio FROM DUAL
  7         UNION ALL
  8         SELECT 3 id, 'C' code, 'f' VALUE, 'S3' datasource, 3 prio FROM DUAL
  9         UNION ALL
 10         SELECT 2 id, 'D' code, 'f' VALUE, 'S1' datasource, 1 prio FROM DUAL
 11         UNION ALL
 12         SELECT 2 id, 'E' code, 't' VALUE, 'S1' datasource, 1 prio FROM DUAL),
 13     extended_product
 14     AS
 15        (SELECT p.*,
 16                ROW_NUMBER ()
 17                   OVER (PARTITION BY p.ID ORDER BY p.PRIO ASC, VALUE DESC) AS ROW_NUMBER
 18           FROM Product p)
 19    SELECT ep.*
 20      FROM extended_product ep
 21     WHERE ep.ROW_NUMBER = 1
 22  ORDER BY ID, PRIO;

        ID C V DA       PRIO ROW_NUMBER
---------- - - -- ---------- ----------
         1 A f S1          1          1
         2 E t S1          1          1
         3 C f S3          3          1

SQL>

CodePudding user response:

As littlefoot mentioned in their answer and based on your requirement, We will be additionally ordering by VALUE(descending) in case there is more than 1 row of same priorities for a particular ID.

WITH Product AS 
(
SELECT 1 id, 'A' code, 'f' value, 'S1' datasource, 1 prio 
UNION ALL
SELECT 1 id, 'B' code, 'f' value, 'S2' datasource, 2 prio 
UNION ALL 
SELECT 3 id, 'C' code, 'f' value, 'S3' datasource, 3 prio 
UNION ALL 
SELECT 2 id, 'D' code, 'f' value, 'S1' datasource, 1 prio 
UNION ALL 
SELECT 2 id, 'E' code, 't' value, 'S1' datasource, 1 prio 
),
extended_product AS
(
  SELECT 
    p.*,
    RANK() OVER (PARTITION BY p.ID ORDER BY p.PRIO ASC, value DESC) as rank_priority
  FROM Product p 
)
SELECT ep.* FROM extended_product ep WHERE rank_priority = 1;
  • Related