Home > Blockchain >  SQL nested query under WHERE
SQL nested query under WHERE

Time:05-08

One of the test questions came by with following schemas, to look for the best doctor in terms of:

  1. Best scored;
  2. The most times/attempts;
  3. For each medical procedures (in terms of name)

[doctor] table

id first_name last_name age
1 Phillip Singleton 50
2 Heidi Elliott 34
3 Beulah Townsend 35
4 Gary Pena 36
5 Doug Lowe 45

[medical_procedure] table

id doctor_id name score
1 3 colonoscopy 44
2 1 colonoscopy 37
3 4 ulcer surgery 98
4 2 angiography 79
5 3 angiography 84
6 3 embolization 87

and list goes on...

Given solution as follow:

WITH cte AS(
SELECT
    name,
    first_name,
    last_name,
    COUNT(*) AS procedure_count,
    RANK() OVER(
      PARTITION BY name
      ORDER BY COUNT(*) DESC) AS place
 FROM
    medical_procedure p JOIN doctor d
    ON p.doctor_id = d.id
 WHERE
    score >= (
      SELECT AVG(score)
      FROM medical_procedure pp
      WHERE pp.name = p.name)
 GROUP BY
    name,
    first_name,
    last_name
 )
 
 SELECT
    name,
    first_name,
    last_name
 FROM cte
 WHERE place = 1;

It'll mean a lot to be clarified on/explain on how the WHERE clause worked out under the subquery:

  1. How it worked out in general
  2. Why must we match the two pp.name and p.name for it to reflect the correct rows...
...
 WHERE
    score >= (
      SELECT AVG(score)
      FROM medical_procedure pp
      WHERE pp.name = p.name)
 ...

Thanks a heap!

CodePudding user response:

Above is join with doctor and medical procedure and group by procedure name and you need doctor names with most attempt and best scored. Subquery will join by procedure avg score and those who have better score than avg will be filtered.

Now there can be multiple doctor better than avg so taken rank by procedure count so most attempted will come first and then you taken first to pick top one

CodePudding user response:

clarified on/explain on how the WHERE clause worked out under the subquery:

Your subquery says "where the doctor's score is above average for that procedure", and that's not part of the question. The subquery is unnecessary.

Your main query's RANK() function is also incorrect. You're ranking by number of procedures done (ORDER BY COUNT(*) DESC), but you really should rank by the doctor's average score in each procedure.

I would approach the issue like so:

select
    name,
    doctor_id,
    avg(score) avg_score,
    count(*) procedure_count,
    rank() over (partition by name order by avg(score) desc) place
from
    medical_procedure
group by
    name,
    doctor_id

This gives us a single row for each procedure and doctor, ranked by how well doctors do on average for each procedure.

Now it's easy to produce the output you want

with
    procedure_ranked as (
        -- the above query
    )
select
    p.name,
    d.first_name,
    d.last_name,
    p.procedure_count
from
    procedure_ranked p
    inner join doctor d on d.id = p.doctor_id
where
    p.place = 1;
  • Related