Home > Back-end >  sql pivot the aggregate: getting wrong result
sql pivot the aggregate: getting wrong result

Time:02-14

Table 1:

CREATE TABLE Main (patient VARCHAR(5), surgery_date date);
INSERT INTO Main VALUES
   ('P01','2018-01-01'),
   ('P02','2019-01-02'),
   ('P03','2019-01-03'),
   ('P04','2020-07-07');

Table 2:

CREATE TABLE Additional (patient VARCHAR(5), record_date date, bpm integer);
INSERT INTO Additional VALUES
   ('P01','2018-01-01',60),
   ('P01','2018-02-01',49),
   ('P01','2018-02-10',52),
   ('P01','2018-03-10',57),
   ('P01','2018-04-10',57),
   ('P02','2019-01-02',46),
   ('P02','2019-02-02',55),
   ('P02','2019-03-02',55),
   ('P03','2019-01-03',60),
   ('P03','2019-02-03',49),
   ('P03','2019-03-03',49),
   ('P04','2021-07-07',71),
   ('P04','2021-08-07',49),
   ('P04','2021-09-07',49);

First table has surgery date and second table has bpm readings post surgery......I want to have % of patient having avg bpm (averaged over calendar month) of > 50 in first and second calendar month post surgery. (need to ignore all data after second calendar month of surgery)

This is my attempt:

SELECT
   FORMAT(surgery_date,'yyyy-MM') as [Month],
   count(*) as [New_Surgery],
   (100*SUM(COALESCE(CASE WHEN b1.bpm>50 THEN 1 ELSE 0 END,0))) as [Month1_percentage],
   (100*SUM(COALESCE(CASE WHEN b2.bpm>50 THEN 1 ELSE 0 END,0))) as [Month2_percentage]
FROM Main a
LEFT JOIN Additional b1 On b1.patient=a.patient and b1.record_date=a.surgery_date
LEFT JOIN Additional b2 On b2.patient=a.patient and b2.record_date=DATEADD(M,1,a.surgery_date)
GROUP BY FORMAT(surgery_date,'yyyy-MM')

But this is not giving me what it should ....

My expectation is:

Month   New_Surgery Month1_percentage   Month2_percentage
2018-01 1   100 100
2019-01 2   50  50
2020-07 1   100 0

CodePudding user response:

I have a feeling I've over-complicated this by breaking into several stages using a couple of CTEs, by first limiting rows to just those required and then pivoting the bpm column before calculating the percentages, however it does return your desired results.

Your main table is - for these results - superfluous since all the data exists in the Additional table; you can of course join with the main table if there are additional columns to retrieve that are omitted here.

with d as (
  select *,
    First_Value(record_date) over(partition by patient order by record_date) Sdate,
    Avg(bpm) over(partition by patient, Year(record_date), Month(record_date)) av
    from Additional
), m as (
    select patient, record_date, Format(sdate,'yyyy-MM') [Month], 
      case when Month(record_date)=Month(sdate) and av >= 50 then av end m1,
      case when Month(record_date)>Month(sdate) and av >= 50 then av end m2
    from d
    where DateDiff(month, sdate, record_date) < 2
), p as (
    select Min([Month])[Month], 
      Count(m1) * 1.0 / Count(*) * 100 M1Percent, 
      Count(m2) * 1.0 / Count(*) * 100 M2Percent, 
      Count(distinct patient) [New_Surgery]
    from m
    group by Year(record_date), Month(record_date)
)
select [month], Max([New_Surgery]) [New_Surgery],
    Max(M1Percent) Month1_Percentage, 
    Max(M2Percent) Month2_Percentage 
from p
group by [Month]
order by [Month]

Working DB<>Fiddle

CodePudding user response:

Your JOIN conditions are only returning record at exactly one month or exactly 2 months.I have changed this to include all tests during the first month and all tests during the second month. Also you had forgotten to divide the sum of cases by the number of cases. I have added /COUNT(*). The small data set means that the results are very simple:0, 50 or 100%.

SELECT
   FORMAT(surgery_date,'yyyy-MM') as [Month],
   count(*) as [New_Surgery],
   (100*SUM(COALESCE(CASE WHEN b1.bpm>50 THEN 1 ELSE 0 END,0)))/COUNT(*) as [Month1_percentage],
   (100*SUM(COALESCE(CASE WHEN b2.bpm>50 THEN 1 ELSE 0 END,0)))/COUNT(*) as [Month2_percentage]
FROM Main a
LEFT JOIN Additional b1 On b1.patient=a.patient 
   and b1.record_date <= DATEADD(month, 1 , surgery_date)
LEFT JOIN Additional b2 On b2.patient=a.patient 
   and b2.record_date<=DATEADD(M,2,a.surgery_date)
      and b2.record_date>DATEADD(month, 1 , surgery_date)
GROUP BY FORMAT(surgery_date,'yyyy-MM')

This gives:

Month   | New_Surgery | Month1_percentage | Month2_percentage
:------ | ----------: | ----------------: | ----------------:
2018-01 |           2 |                50 |               100
2019-01 |           4 |                50 |                50
2020-07 |           1 |                 0 |                 0
  • Related