I have the following problem with selecting rows in a group: I have records from different data sources, each data source has a priority assigned to it. Whenever I have two identical records with different data sources, I want to select the one with the lowest priority. Whenever I have duplicate records from the same data source, but with different entries in the VALUE column, I want to select the row with 't' (if there are several, take the first).
Here's an example dataset:
And the expected result:
What I did until now was to select those entries, that had the lowest prio inside a group. Here's a code example:
WITH Product AS (
SELECT 1 id, 'A' code, 'f' value, 'S1' datasource, 1 prio FROM DUAL
UNION ALL
SELECT 1 id, 'A' code, 'f' value, 'S2' datasource, 2 prio FROM DUAL
UNION ALL
SELECT 3 id, 'A' code, 'f' value, 'S3' datasource, 3 prio FROM DUAL
UNION ALL
SELECT 2 id, 'A' code, 'f' value, 'S1' datasource, 1 prio FROM DUAL
UNION ALL
SELECT 2 id, 'B' code, 't' value, 'S1' datasource, 1 prio FROM DUAL
),
extended_product AS
(
SELECT p.*, ROW_NUMBER() OVER(PARTITION BY p.ID ORDER BY p.PRIO ASC) AS ROW_NUMBER
FROM Product p
)
SELECT ep.*
FROM extended_product ep
WHERE ROW_NUMBER = 1;
I don't know how to do both conditions, selecting lowest prio and in case all entries inside the group have the same datasource, take only those with 't' (and take the first one if there are several).
CodePudding user response:
You're close, I think. row_number
misses the value
part:
ROW_NUMBER () OVER (PARTITION BY p.ID ORDER BY p.PRIO ASC, VALUE DESC) AS ROW_NUMBER
SQL> WITH
2 Product
3 AS
4 (SELECT 1 id, 'A' code, 'f' VALUE, 'S1' datasource, 1 prio FROM DUAL
5 UNION ALL
6 SELECT 1 id, 'B' code, 'f' VALUE, 'S2' datasource, 2 prio FROM DUAL
7 UNION ALL
8 SELECT 3 id, 'C' code, 'f' VALUE, 'S3' datasource, 3 prio FROM DUAL
9 UNION ALL
10 SELECT 2 id, 'D' code, 'f' VALUE, 'S1' datasource, 1 prio FROM DUAL
11 UNION ALL
12 SELECT 2 id, 'E' code, 't' VALUE, 'S1' datasource, 1 prio FROM DUAL),
13 extended_product
14 AS
15 (SELECT p.*,
16 ROW_NUMBER ()
17 OVER (PARTITION BY p.ID ORDER BY p.PRIO ASC, VALUE DESC) AS ROW_NUMBER
18 FROM Product p)
19 SELECT ep.*
20 FROM extended_product ep
21 WHERE ep.ROW_NUMBER = 1
22 ORDER BY ID, PRIO;
ID C V DA PRIO ROW_NUMBER
---------- - - -- ---------- ----------
1 A f S1 1 1
2 E t S1 1 1
3 C f S3 3 1
SQL>
CodePudding user response:
As littlefoot mentioned in their answer and based on your requirement, We will be additionally ordering by VALUE(descending) in case there is more than 1 row of same priorities for a particular ID.
WITH Product AS
(
SELECT 1 id, 'A' code, 'f' value, 'S1' datasource, 1 prio
UNION ALL
SELECT 1 id, 'B' code, 'f' value, 'S2' datasource, 2 prio
UNION ALL
SELECT 3 id, 'C' code, 'f' value, 'S3' datasource, 3 prio
UNION ALL
SELECT 2 id, 'D' code, 'f' value, 'S1' datasource, 1 prio
UNION ALL
SELECT 2 id, 'E' code, 't' value, 'S1' datasource, 1 prio
),
extended_product AS
(
SELECT
p.*,
RANK() OVER (PARTITION BY p.ID ORDER BY p.PRIO ASC, value DESC) as rank_priority
FROM Product p
)
SELECT ep.* FROM extended_product ep WHERE rank_priority = 1;