Home > Software design >  Aggregate functions in a recursive CTE to calculate fractions of sub-groups
Aggregate functions in a recursive CTE to calculate fractions of sub-groups

Time:12-03

I have a self-referential table which I am designing to describe mixtures of ingredients

id raw_input parent_input amount
a x 4
a y 6
b j 1
b k 3
c a 6
c b 1
d c 1
d a 1

I'd like to write a recursive CTE query which calculates the fraction of each base_input in each individual mix. For example, I'd like the output:

id raw_input amount
a x 0.4
a y 0.6
b j 0.25
b k 0.75
c x 0.34285714
c y 0.51428571
c j 0.03571429
c k 0.10714286
  • I haven't added mixture d here as it's quite tricky to calculate at this stage

where the values are calculated as such:

id raw_input amount
a x 4/(4 6)
a y 6/(4 6)
b j 1/4
b k 3/4
c x 0.4*(6/(6 1))
c y 0.6*(6/(6 1))
c j 0.25*(1/(6 1))
c k 0.75*(1/(6 1))

My method for attempting this was to join a aggregate total onto the tables in the CTE, then divide the masses by this as such:

WITH cte AS (
    SELECT id, base_input, mass_fraction FROM
        (SELECT E.id, E.base_input, E.amount/f.total_mass AS mass_fraction
        FROM mix_table E
        JOIN (SELECT id, SUM(amount) as total_mass
             FROM mix_table
             GROUP BY id
             ) AS root_totals
        ON root_totals.id = E.id
        WHERE E.base_input IS NOT NULL) AS r
        UNION ALL
        
        SELECT b.id, base_input, mass_fraction/totals.total_mass FROM 
        (SELECT F.id, cte.base_input, cte.amount/branch_totals.total_mass AS mass_fraction
         FROM mix_table F 
         JOIN cte on F.parent_input = cte.id) as b
         JOIN (SELECT id, SUM(amount) as total_mass
              FROM mix_table
              GROUP BY id 
         ) AS branch_totals
         ON branch_totals.id = totals.id
    )

select * from cte

Running it without the totals joined onto the CTE gets most of the way there, just the individual components of the mixture group C are not scaled by their respective fractions.

It seems like a CTE with an aggregate function is exactly what I want to do, just the error raised by SQL server prevents me doing it.

There must be a way around this, I'm sure I'm not the first person to want to do this.

Edit: I'd like to clarify that I'm aiming to do this for more than one level, where I can expand the solution to account for an indeterminate level of nested parent/children

CodePudding user response:

As you only have 2 levels, it would seem easier to do the UNION ALL outside of the CTE, rather than within it:

SELECT *
INTO dbo.YourTable
FROM (VALUES('a','x',NULL,4),
            ('a','y',NULL,6),
            ('b','j',NULL,1),
            ('b','k',NULL,3),
            ('c',NULL,'a',6),
            ('c',NULL,'b',1))V(id,raw_input,parent_input, amount);
GO
WITH CTE AS(
    SELECT id,
           raw_input,
           amount,
           parent_input,
           (amount*1.) / SUM(Amount) OVER (PARTITION BY id) AS amountperc
    FROM dbo.YourTable YT)
SELECT id,
       raw_input,
       --amount,
       amountperc AS amount
FROM CTE
WHERE parent_input IS NULL
UNION ALL
SELECT C.id,
       P.raw_input,
       C.amountperc * P.amountperc
FROM CTE P
     JOIN CTE C ON P.id = C.parent_input;
GO

DROP TABLE dbo.YourTable;

CodePudding user response:

I managed to solve my own quandry, and for future viewers I think this only works because it doesn't require the aggregation to be run more than once, I can do the aggregation calculation before the CTE, then use it within the CTE:

-- I first define the total aggregation here

WITH totals AS (SELECT id, SUM(amount) as total_mass
              FROM test_mix
              GROUP BY id),
cte AS (
    SELECT id, raw_input, amount FROM
        (SELECT E.id, E.raw_input, E.amount/totals.total_mass AS amount
        FROM test_mix E
        JOIN totals
        ON totals.id = E.id
        WHERE E.raw_input IS NOT NULL) AS r
        UNION ALL
        
        SELECT b.id, raw_input, child_q*(parent_q/totals.total_mass) FROM 
        (SELECT F.id, cte.raw_input, cte.amount AS parent_q, F.amount AS child_q
         FROM test_mix F 
         JOIN cte on F.parent_input = cte.id) as b
         JOIN totals                                   -- then use it here
         ON totals.id = b.id
    )

SELECT id,raw_input, SUM(amount)
FROM cte
GROUP BY id,raw_input
  • Related