I have this table (table_2), which basically provides patient info and the department they visited on specific dates -
Patient_ID | department | service_dte | birth_dte | count |
---|---|---|---|---|
12345 | cardiac | 1/2/21 | 6/18/78 | 5 |
12345 | cardiac | 8/20/21 | 6/18/78 | 5 |
12345 | cardiac | 10/28/21 | 6/18/78 | 5 |
12345 | ortho | 1/14/21 | 6/18/78 | 2 |
12345 | ortho | 7/7/21 | 6/18/78 | 2 |
12345 | cardiac | 8/20/21 | 6/18/78 | 5 |
12345 | cardiac | 4/19/21 | 6/18/78 | 5 |
12345 | obgyn | 4/1/21 | 6/18/78 | 1 |
78645 | neuro | 5/1/21 | 7/18/87 | 2 |
78645 | neuro | 7/7/21 | 7/18/87 | 2 |
78645 | ent | 7/7/21 | 7/18/87 | 1 |
32423 | gastro | 11/7/21 | 3/12/57 | 1 |
I would like my desired output to produce an additional column (max) that provides highest repeated "department" per "Patient_ID", while keeping all the columns as they are, similar to below table -
Patient_ID | department | service_dte | birth_dte | count | max |
---|---|---|---|---|---|
12345 | cardiac | 1/2/21 | 6/18/78 | 5 | cardiac |
12345 | cardiac | 8/20/21 | 6/18/78 | 5 | cardiac |
12345 | cardiac | 10/28/21 | 6/18/78 | 5 | cardiac |
12345 | ortho | 1/14/21 | 6/18/78 | 2 | cardiac |
12345 | ortho | 7/7/21 | 6/18/78 | 2 | cardiac |
12345 | cardiac | 8/20/21 | 6/18/78 | 5 | cardiac |
12345 | cardiac | 4/19/21 | 6/18/78 | 5 | cardiac |
12345 | obgyn | 4/1/21 | 6/18/78 | 1 | cardiac |
78645 | neuro | 5/1/21 | 7/18/87 | 2 | neuro |
78645 | neuro | 7/7/21 | 7/18/87 | 2 | neuro |
78645 | ent | 7/7/21 | 7/18/87 | 1 | neuro |
32423 | gastro | 11/7/21 | 3/12/57 | 1 | gastro |
I tried below code; that provided me the count of department a patient visited in the above table -
select *
, count(department) OVER (PARTITION BY patient_id, department) AS count
FROM table_1
however, it is not giving me desired value for patient 12345's max column, which should be cardiac. Instead of cardiac, I am getting ortho. Please kindly assist. This is the code I am trying -
select *
, max(department) OVER (PARTITION BY patient_id) AS max
FROM table_2
Please kindly assist with an optimal way I can get the maximum column and advise on what I am or might be doing wrong.
Thank you.
CodePudding user response:
Use also FIRST_VALUE()
window function in a 2nd level of aggregation:
SELECT *, FIRST_VALUE(department) OVER (PARTITION BY patient_id ORDER BY count DESC) AS max
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY patient_id, department) AS count
FROM table_1
) t
See the demo.
CodePudding user response:
count(department) OVER (PARTITION BY patient_id, department)
would be calculated for each row in your results, simply counting the number of records (where department is not null, to be specific) for the same patient & department as this row. This is not what you wanted.
max(department) OVER (PARTITION BY patient_id) AS max
would also be calculated for each row in your results, simply finding the maximum of department name for the same patient as this row. This is the last department when all the departments visited by this patient are alphabetically sorted (this is how MAX() function is applied to character values), and this is not what you wanted.
We want a subquery to return the most visited department by each patient, here is an example:
select
tbl1.*
, MostVisited.department as MostVisitedDepartment
from
table_1 tbl1
inner join
(-- Take only the most visited department for each patient
select *
from (-- rank departments by number of times
-- visited for each patient; highest visited gets 1
-- the result of a tie (the patient visited two departments the
-- same number of times) depends on how the rdbms executes it.
select
VisitCnt.*
, row_number() over (partition by VisitCnt.patient_id
order by VisitCnt.TimesVisited desc) as Seq
from (-- Count each department visited by each patient
select
patient_id, department, count(*) as TimesVisited
from table_1
group by patient_id, department
) VisitCnt
) VisitCntRnked
where VisitCntRnked.Seq=1
) MostVisited
on MostVisited.Patient_ID=tbl1.Patient_ID
As to being 'optimal': it depends on what you mean by it : The table size, indexes, statistics on the table, your particular database product, and, if you want to run this query for the entire (or most of the) population of the table, or only a small subset. We generally expect the query optimiser to take that these account in executing the query. There are many exceptions but optimising the query needs much more information than what is available in this question.