Home > Net >  SQL : Join rows on dynamic keys
SQL : Join rows on dynamic keys

Time:07-07

I have a table with products that can be identified via a composite key of three keys refi, refj and refk. refi can not be null, refj and refk can be null. The keys have an order in trustworthiness power. Therefore refi has the lowest thrustworthiness power, refj medium and refk the highest trustworthiness power.

A product p can upgrade its trustworthiness key through time and thus at one point it can be identified via refi and at another point in time via refi and refk for exemple. This time will be represented as time when the product was scaned - column t.

Rule 1 : The goal is to identify a unique product through time by corresponding at least one of the keys in the table - that is the non null key that has the highest thrustworthiness power.

Rule 2: Another rule is that the order matters in the sense that such correspondance should'nt be made in the case if a product downgrades its trustworthiness key through time.

Rule 3: And last and final rule - when 2 products p1 and p2 are identified as a potential correspondance to a third product p3, it is the product with the common key with the highest trustworthiness power that is finally corresponded with p3.

Here are some exemples:

Exemple 1 and exemple 2 illustrate rule number 1.

Exemple 1:

t refi refj refk
t1 refi1 NULL NULL
t2 refi1 refj1 NULL
t3 refi1 refj1 refk

In this exemple I want to be able to count these three rows as 1 unique product because they all have the same non null refi1. It is also an exemple of rule number 3 where the 2 products identified as a potential correspondance to a third, are also corresponding to eachother.

Exemple 2:

t refi refj refk
t1 refi1 refj1 NULL
t2 refi2 refj1 NULL

In this exemple I want to be able to count these two rows as 1 unique product because they have a common refj and refj has a higher trustworthiness power than refi. So even if the refi keys are different, it is the refj that counts.

Exemple 3:

t refi refj refk
t1 refi1 refj1 refk1
t2 refi2 refj1 NULL

In this exemple I want to illustrate rule number 2. In this cas I want to count these 2 rows as two separate products because downgrading to a key with less trustworthiness power is not possible so in this case we suppose it's another product.

Exemple 4:

t refi refj refk
t1 refi1 NULL NULL
t2 refi2 refj1 NULL
t3 refi1 refj1 refk1

In this exemple I want to illustrate rule number 3. In this case 1st and 2nd row are potential correspondances for the product in 3d row. The 1st based on refi1 and the 2nd row product based on refj1. As refj has a higher trustworthiness power than refi, in this case the 1st row is identified as 1 product and the last 2 rows as the same product.

Goal : Based on these rules i need to find a SQL query that treats these cases in one way or another - it can be counting the unique products or associating unique keys in a separate column directly.

Any help, idea of an algorithm or a suggestion of how to treat the distinct cases will be appreciated!

CodePudding user response:

You can work separately on each "ref" by applying self joins on the same "ref" value, then apply a UNION to the results, though fixing priority according to the one pointed in the post (k > j > i).

WITH 
    cte_refk AS (
        SELECT t1.t AS t1_t, t2.t AS t2_t
        FROM       tab t1
        INNER JOIN tab t2 ON t1.refk = t2.refk
                         AND t1.t    > t2.t    ),
    cte_refj AS (
        SELECT t1.t AS t1_t, t2.t AS t2_t
        FROM       tab t1
        INNER JOIN tab t2 ON t1.refj = t2.refj
                         AND t1.t    > t2.t    ),
    cte_refi AS (
        SELECT t1.t AS t1_t, t2.t AS t2_t
        FROM       tab t1
        INNER JOIN tab t2 ON t1.refi = t2.refi
                         AND t1.t    > t2.t    )
SELECT * 
FROM cte_refk

UNION ALL

SELECT * 
FROM cte_refj 
WHERE t1_t NOT IN (SELECT t1_t FROM cte_refk)

UNION ALL

SELECT * 
FROM cte_refi
WHERE t1_t NOT IN (SELECT t1_t FROM cte_refk)
  AND t1_t NOT IN (SELECT t1_t FROM cte_refj)

This query will get you the matching rows.

Feel free to play with it using your examples at this fiddle.

  • Related