I have a table with products that can be identified via a composite key of three keys refi
, refj
and refk
.
refi can not be null, refj and refk can be null.
The keys have an order in trustworthiness power. Therefore refi has the lowest thrustworthiness power, refj medium and refk the highest trustworthiness power.
A product p can upgrade its trustworthiness key through time and thus at one point it can be identified via refi and at another point in time via refi and refk for exemple. This time will be represented as time when the product was scaned - column t.
Rule 1 : The goal is to identify a unique product through time by corresponding at least one of the keys in the table - that is the non null key that has the highest thrustworthiness power.
Rule 2: Another rule is that the order matters in the sense that such correspondance should'nt be made in the case if a product downgrades its trustworthiness key through time.
Rule 3: And last and final rule - when 2 products p1 and p2 are identified as a potential correspondance to a third product p3, it is the product with the common key with the highest trustworthiness power that is finally corresponded with p3.
Here are some exemples:
Exemple 1 and exemple 2 illustrate rule number 1.
Exemple 1:
t | refi | refj | refk |
---|---|---|---|
t1 | refi1 | NULL | NULL |
t2 | refi1 | refj1 | NULL |
t3 | refi1 | refj1 | refk |
In this exemple I want to be able to count these three rows as 1 unique product because they all have the same non null refi1. It is also an exemple of rule number 3 where the 2 products identified as a potential correspondance to a third, are also corresponding to eachother.
Exemple 2:
t | refi | refj | refk |
---|---|---|---|
t1 | refi1 | refj1 | NULL |
t2 | refi2 | refj1 | NULL |
In this exemple I want to be able to count these two rows as 1 unique product because they have a common refj and refj has a higher trustworthiness power than refi. So even if the refi keys are different, it is the refj that counts.
Exemple 3:
t | refi | refj | refk |
---|---|---|---|
t1 | refi1 | refj1 | refk1 |
t2 | refi2 | refj1 | NULL |
In this exemple I want to illustrate rule number 2. In this cas I want to count these 2 rows as two separate products because downgrading to a key with less trustworthiness power is not possible so in this case we suppose it's another product.
Exemple 4:
t | refi | refj | refk |
---|---|---|---|
t1 | refi1 | NULL | NULL |
t2 | refi2 | refj1 | NULL |
t3 | refi1 | refj1 | refk1 |
In this exemple I want to illustrate rule number 3. In this case 1st and 2nd row are potential correspondances for the product in 3d row. The 1st based on refi1 and the 2nd row product based on refj1. As refj has a higher trustworthiness power than refi, in this case the 1st row is identified as 1 product and the last 2 rows as the same product.
Goal : Based on these rules i need to find a SQL query that treats these cases in one way or another - it can be counting the unique products or associating unique keys in a separate column directly.
Any help, idea of an algorithm or a suggestion of how to treat the distinct cases will be appreciated!
CodePudding user response:
You can work separately on each "ref" by applying self joins on the same "ref" value, then apply a UNION
to the results, though fixing priority according to the one pointed in the post (k > j > i).
WITH
cte_refk AS (
SELECT t1.t AS t1_t, t2.t AS t2_t
FROM tab t1
INNER JOIN tab t2 ON t1.refk = t2.refk
AND t1.t > t2.t ),
cte_refj AS (
SELECT t1.t AS t1_t, t2.t AS t2_t
FROM tab t1
INNER JOIN tab t2 ON t1.refj = t2.refj
AND t1.t > t2.t ),
cte_refi AS (
SELECT t1.t AS t1_t, t2.t AS t2_t
FROM tab t1
INNER JOIN tab t2 ON t1.refi = t2.refi
AND t1.t > t2.t )
SELECT *
FROM cte_refk
UNION ALL
SELECT *
FROM cte_refj
WHERE t1_t NOT IN (SELECT t1_t FROM cte_refk)
UNION ALL
SELECT *
FROM cte_refi
WHERE t1_t NOT IN (SELECT t1_t FROM cte_refk)
AND t1_t NOT IN (SELECT t1_t FROM cte_refj)
This query will get you the matching rows.
Feel free to play with it using your examples at this fiddle.