Home > Blockchain >  Is it possible to use a aggregate function over partition by as a case condition in SQL?
Is it possible to use a aggregate function over partition by as a case condition in SQL?

Time:03-14

Problem statement is to calculate median from a table that has two columns. One specifying a number and the other column specifying the frequency of the number. For e.g. Table "Numbers":

Num Freq
1 3
2 3

This median needs to be found for the flattened array with values: 1,1,1,2,2,2

Query:

with ct1 as 
(select num,frequency, sum(frequency) over(order by num) as sf from numbers o)
select case when count(num) over(order by num) = 1 then num 
when count(num) over (order by num) > 1 then sum(num)/2 end median 
    from ct1 b where sf <= (select max(sf)/2 from ct1) or (sf-frequency) <= (select max(sf)/2 from ct1)

Is it not possible to use count(num) over(order by num) as the condition in the case statement?

CodePudding user response:

Find the relevant row / 2 rows based of the accumulated frequencies, and take the average of num.
The linked Fiddle will also show you the computations leading to the result.

  • If you already know that num is unique, you can eliminate the 1st CTE.
  • If you have an ID column, it can be added to the ORDER BY clauses, and once again the 1st CTE can be eliminated.

with 
t1 as
(
    select     num
              ,sum(freq) as freq
            
    from       t
    
    group by   num
), 
t2 as
(
    select    t1.*
             ,nvl(sum(freq) over (order by num, freq rows between unbounded preceding and 1 preceding),0)  as freq_acc_sum_1
             ,sum(freq) over (order by num, freq)                                                          as freq_acc_sum_2
             ,sum(freq) over ()                                                                            as freq_sum
    
    from      t1
)
select avg(num)
from   t2
where  freq_sum/2 between freq_acc_sum_1 and freq_acc_sum_2

Fiddle

Example:

NUM FREQ FREQ_ACC_SUM_1 FREQ_ACC_SUM_2 FREQ_SUM RELEVANT_RECORD
20 2 0 2 30
24 2 2 4 30
28 3 4 7 30
31 3 7 10 30
42 2 10 12 30
48 2 12 14 30
55 1 14 15 30 V
69 5 15 20 30 V
71 5 20 25 30
73 5 25 30 30
MEDIAN
62

Fiddle for 1M records

CodePudding user response:

You can find the one (or two) middle value(s) and then average:

SELECT AVG(num) AS median
FROM   (
  SELECT num,
         freq,
         SUM(freq) OVER (ORDER BY num) AS cum_freq,
         (SUM(freq) OVER ()   1)/2 AS median_freq
  FROM   table_name
)
WHERE  cum_freq - freq < median_freq
AND    median_freq < cum_freq   1

Or, expand the values using a LATERAL join to a hierarchical query and then use the MEDIAN function:

SELECT MEDIAN(num) AS median
FROM   table_name t
       CROSS JOIN LATERAL (
         SELECT LEVEL
         FROM   DUAL
         WHERE  freq > 0
         CONNECT BY LEVEL <= freq
       )

Which, for the sample data:

CREATE TABLE table_name (Num, Freq) AS
SELECT 1, 3 FROM DUAL UNION ALL
SELECT 2, 3 FROM DUAL;

Outputs:

MEDIAN
1.5

(Note: For your sample data, there are 6 items, an even number, so the MEDIAN will be half way between the value of 3rd and 4rd items; so half way between 1 and 2 = 1.5.)

db<>fiddle here

  • Related