joining the result of inner join back onto the original table-CodePudding

I have the following two tables:

claims_tbl

client_ID claim_ID date
personA   claim1   date1
personB   claim2   date2
personB   claim3   date3
personB   claim4   date4
personC   claim5   date5
personD   claim6   date5

procedures_tbl

claim_id  procedure_code
claim1    hash1
claim2    hash27
claim3    hash1
claim4    hash45
claim5    hash22
claim6    hash1

I would like to extract the claim_IDs associated with hash1 from claims_tbl then, extract the client_ID associated with that claimID. Then use that to extract all the procedures associated with that client_ID.

I have:

select *
from ((select distinct c.claims_id
      from procedures_tbl p
      inner join claims_tbl c
      on p.claim_id=c.claim_id
      and p.procedure_code = 'hash1') t
inner join claims_tbl c
on c.claims_id=t.claims_id) t
inner join from procedures_tbl p
on t.claims_id=p.claims_id

Is there a more efficient way to do this?

my expected table would be:

client_ID claim_ID date  procedure_code
personA   claim1   date1 hash1
personB   claim2   date2 hash27
personB   claim3   date3 hash1
personB   claim4   date4 hash45
personD   claim6   date5 hash1

CodePudding user response：

so I just rewrote the layout of you SQL a little bit:

select *
from (
    select * from
    (
        select distinct 
            c.claims_id
        from procedures_tbl p
        join claims_tbl c
          on p.claim_id = c.claim_id
            and p.procedure_code = 'hash1'
    ) t1
    join claims_tbl c
        on c.claims_id = t1.claims_id
) t2
join from procedures_tbl p
    on t2.claims_id=p.claims_id

t1 is select the distinct set if claims_id's that are both in prcedures and claims tables, that also have procedure_code_hash1.

This is used to select all claims with the same id

which is then used to select all procedures with the same claim id.

the distinct in the t1 block implies there is more than one to one relations between these two tables.

So making some fake data that represents this:

WITH procedures_tbl(claim_id, claims_id, procedure_code,p_details) AS (
    SELECT * FROM VALUES
        (1, 10, 'hash1', 'pd 1'),
        (2, 10, 'hash1', 'pd 2'),
        (3, 11, 'hash1', 'pd 3'),
        (4, 11, 'hash1', 'pd 4')
), claims_tbl(claim_id, claims_id, c_details) AS (
    SELECT * FROM VALUES
        (1, 10, 'cd 1'),
        (2, 10, 'cd 2'),
        (3, 11, 'cd 3'),
        (4, 11, 'cd 4')
)

and then modifying the SQL so it runs (in Snowflake)

select p.*
    ,t2.*
from (
    select c.* from
    (
        select distinct 
            c.claims_id
        from procedures_tbl p
        join claims_tbl c
          on p.claim_id = c.claim_id
            and p.procedure_code = 'hash1'
    ) t1
    join claims_tbl c
        on c.claims_id = t1.claims_id
) t2
join procedures_tbl p
    on t2.claims_id = p.claims_id
order by 1,2,3,4;

CLAIM_ID	CLAIMS_ID	PROCEDURE_CODE	P_DETAILS	CLAIM_ID	CLAIMS_ID	C_DETAILS
1	10	hash1	pd 1	1	10	cd 1
1	10	hash1	pd 1	2	10	cd 2
2	10	hash1	pd 2	1	10	cd 1
2	10	hash1	pd 2	2	10	cd 2
3	11	hash1	pd 3	3	11	cd 3
3	11	hash1	pd 3	4	11	cd 4
4	11	hash1	pd 4	3	11	cd 3
4	11	hash1	pd 4	4	11	cd 4

so this gets gets the full procedures_tbl joined claims_tbl for any claims_id where any one entry in has a code = hash1

This is the same as:

SELECT p.*
    ,c.*
FROM procedures_tbl p
JOIN claims_tbl c
    ON c.claims_id = p.claims_id
WHERE c.claims_id IN (
    select 
        sc.claims_id
    from procedures_tbl sp
    join claims_tbl sc
        on sp.claim_id = sc.claim_id
            and sp.procedure_code = 'hash1'
);

And they should have the same execution plan. But I would guess your plan will be faster in Snowflake, because if procedures_tbl and claims_tbl are large, the WHERE IN filter might not always be run soon enough.

I would be inclined to write it this way as it's the same:

select 
    p.*
    ,c.*
from (
    select distinct 
        c.claims_id
    from procedures_tbl p
    join claims_tbl c
        on p.claim_id = c.claim_id
            and p.procedure_code = 'hash1'
) d 
join claims_tbl c
    on c.claims_id = d.claims_id
join procedures_tbl p
    on p.claims_id = d.claims_id 
order by 1,2,3,4;

thus with this slightly expanded data:

WITH procedures_tbl(claim_id, claims_id, procedure_code,p_details) AS (
    SELECT * FROM VALUES
        (1, 10, 'hash1', 'pd 1'),
        (2, 10, 'hash2', 'pd 2'),
        (3, 11, 'hash1', 'pd 3'),
        (4, 11, 'hash3', 'pd 4'),
        (5, 12, 'hash2', 'pd 5'),
        (6, 12, 'hash3', 'pd 6')
), claims_tbl(claim_id, claims_id, c_details) AS (
    SELECT * FROM VALUES
        (1, 10, 'cd 1'),
        (2, 10, 'cd 2'),
        (3, 11, 'cd 3'),
        (4, 11, 'cd 4'),
        (5, 12, 'cd 5'),
        (6, 12, 'cd 6')
)

we get:

CLAIM_ID	CLAIMS_ID	PROCEDURE_CODE	P_DETAILS	CLAIM_ID	CLAIMS_ID	C_DETAILS
1	10	hash1	pd 1	1	10	cd 1
1	10	hash1	pd 1	2	10	cd 2
2	10	hash2	pd 2	1	10	cd 1
2	10	hash2	pd 2	2	10	cd 2
3	11	hash1	pd 3	3	11	cd 3
3	11	hash1	pd 3	4	11	cd 4
4	11	hash3	pd 4	3	11	cd 3
4	11	hash3	pd 4	4	11	cd 4

CodePudding user response：

Do the joins in the same order that you described the relationships in the question.

SELECT c1.client_id, c2.claim_id, c2.date, p2.procedure_code
FROM procedures_tbl AS p1 -- Get claims associated with procedure hash1
JOIN claims_tbl AS c1 ON p1.claim_id = c1.claim_id -- get client_ID from those claims
JOIN claims_tbl AS c2 ON c1.client_id = c2.client_id -- get other claims for that client_id
JOIN procedures_tbl AS p2 ON p2.claim_id = c2.claim_id -- get those procedure codes
WHERE p1.procedure_code = 'hash1'

DEMO