I have the following two tables:
claims_tbl
client_ID claim_ID date
personA claim1 date1
personB claim2 date2
personB claim3 date3
personB claim4 date4
personC claim5 date5
personD claim6 date5
procedures_tbl
claim_id procedure_code
claim1 hash1
claim2 hash27
claim3 hash1
claim4 hash45
claim5 hash22
claim6 hash1
I would like to extract the claim_IDs associated with hash1
from claims_tbl then, extract the client_ID associated with that claimID
. Then use that to extract all the procedures associated with that client_ID
.
I have:
select *
from ((select distinct c.claims_id
from procedures_tbl p
inner join claims_tbl c
on p.claim_id=c.claim_id
and p.procedure_code = 'hash1') t
inner join claims_tbl c
on c.claims_id=t.claims_id) t
inner join from procedures_tbl p
on t.claims_id=p.claims_id
Is there a more efficient way to do this?
my expected table would be:
client_ID claim_ID date procedure_code
personA claim1 date1 hash1
personB claim2 date2 hash27
personB claim3 date3 hash1
personB claim4 date4 hash45
personD claim6 date5 hash1
CodePudding user response:
so I just rewrote the layout of you SQL a little bit:
select *
from (
select * from
(
select distinct
c.claims_id
from procedures_tbl p
join claims_tbl c
on p.claim_id = c.claim_id
and p.procedure_code = 'hash1'
) t1
join claims_tbl c
on c.claims_id = t1.claims_id
) t2
join from procedures_tbl p
on t2.claims_id=p.claims_id
t1 is select the distinct set if claims_id's that are both in prcedures and claims tables, that also have procedure_code_hash1.
This is used to select all claims with the same id
which is then used to select all procedures with the same claim id.
the distinct in the t1 block implies there is more than one to one relations between these two tables.
So making some fake data that represents this:
WITH procedures_tbl(claim_id, claims_id, procedure_code,p_details) AS (
SELECT * FROM VALUES
(1, 10, 'hash1', 'pd 1'),
(2, 10, 'hash1', 'pd 2'),
(3, 11, 'hash1', 'pd 3'),
(4, 11, 'hash1', 'pd 4')
), claims_tbl(claim_id, claims_id, c_details) AS (
SELECT * FROM VALUES
(1, 10, 'cd 1'),
(2, 10, 'cd 2'),
(3, 11, 'cd 3'),
(4, 11, 'cd 4')
)
and then modifying the SQL so it runs (in Snowflake)
select p.*
,t2.*
from (
select c.* from
(
select distinct
c.claims_id
from procedures_tbl p
join claims_tbl c
on p.claim_id = c.claim_id
and p.procedure_code = 'hash1'
) t1
join claims_tbl c
on c.claims_id = t1.claims_id
) t2
join procedures_tbl p
on t2.claims_id = p.claims_id
order by 1,2,3,4;
CLAIM_ID | CLAIMS_ID | PROCEDURE_CODE | P_DETAILS | CLAIM_ID | CLAIMS_ID | C_DETAILS |
---|---|---|---|---|---|---|
1 | 10 | hash1 | pd 1 | 1 | 10 | cd 1 |
1 | 10 | hash1 | pd 1 | 2 | 10 | cd 2 |
2 | 10 | hash1 | pd 2 | 1 | 10 | cd 1 |
2 | 10 | hash1 | pd 2 | 2 | 10 | cd 2 |
3 | 11 | hash1 | pd 3 | 3 | 11 | cd 3 |
3 | 11 | hash1 | pd 3 | 4 | 11 | cd 4 |
4 | 11 | hash1 | pd 4 | 3 | 11 | cd 3 |
4 | 11 | hash1 | pd 4 | 4 | 11 | cd 4 |
so this gets gets the full procedures_tbl joined claims_tbl for any claims_id where any one entry in has a code = hash1
This is the same as:
SELECT p.*
,c.*
FROM procedures_tbl p
JOIN claims_tbl c
ON c.claims_id = p.claims_id
WHERE c.claims_id IN (
select
sc.claims_id
from procedures_tbl sp
join claims_tbl sc
on sp.claim_id = sc.claim_id
and sp.procedure_code = 'hash1'
);
And they should have the same execution plan. But I would guess your plan will be faster in Snowflake, because if procedures_tbl and claims_tbl are large, the WHERE IN filter might not always be run soon enough.
I would be inclined to write it this way as it's the same:
select
p.*
,c.*
from (
select distinct
c.claims_id
from procedures_tbl p
join claims_tbl c
on p.claim_id = c.claim_id
and p.procedure_code = 'hash1'
) d
join claims_tbl c
on c.claims_id = d.claims_id
join procedures_tbl p
on p.claims_id = d.claims_id
order by 1,2,3,4;
thus with this slightly expanded data:
WITH procedures_tbl(claim_id, claims_id, procedure_code,p_details) AS (
SELECT * FROM VALUES
(1, 10, 'hash1', 'pd 1'),
(2, 10, 'hash2', 'pd 2'),
(3, 11, 'hash1', 'pd 3'),
(4, 11, 'hash3', 'pd 4'),
(5, 12, 'hash2', 'pd 5'),
(6, 12, 'hash3', 'pd 6')
), claims_tbl(claim_id, claims_id, c_details) AS (
SELECT * FROM VALUES
(1, 10, 'cd 1'),
(2, 10, 'cd 2'),
(3, 11, 'cd 3'),
(4, 11, 'cd 4'),
(5, 12, 'cd 5'),
(6, 12, 'cd 6')
)
we get:
CLAIM_ID | CLAIMS_ID | PROCEDURE_CODE | P_DETAILS | CLAIM_ID | CLAIMS_ID | C_DETAILS |
---|---|---|---|---|---|---|
1 | 10 | hash1 | pd 1 | 1 | 10 | cd 1 |
1 | 10 | hash1 | pd 1 | 2 | 10 | cd 2 |
2 | 10 | hash2 | pd 2 | 1 | 10 | cd 1 |
2 | 10 | hash2 | pd 2 | 2 | 10 | cd 2 |
3 | 11 | hash1 | pd 3 | 3 | 11 | cd 3 |
3 | 11 | hash1 | pd 3 | 4 | 11 | cd 4 |
4 | 11 | hash3 | pd 4 | 3 | 11 | cd 3 |
4 | 11 | hash3 | pd 4 | 4 | 11 | cd 4 |
CodePudding user response:
Do the joins in the same order that you described the relationships in the question.
SELECT c1.client_id, c2.claim_id, c2.date, p2.procedure_code
FROM procedures_tbl AS p1 -- Get claims associated with procedure hash1
JOIN claims_tbl AS c1 ON p1.claim_id = c1.claim_id -- get client_ID from those claims
JOIN claims_tbl AS c2 ON c1.client_id = c2.client_id -- get other claims for that client_id
JOIN procedures_tbl AS p2 ON p2.claim_id = c2.claim_id -- get those procedure codes
WHERE p1.procedure_code = 'hash1'