I have a self-referential table which I am designing to describe mixtures of ingredients
id | raw_input | parent_input | amount |
---|---|---|---|
a | x | 4 | |
a | y | 6 | |
b | j | 1 | |
b | k | 3 | |
c | a | 6 | |
c | b | 1 | |
d | c | 1 | |
d | a | 1 |
I'd like to write a recursive CTE query which calculates the fraction of each base_input in each individual mix. For example, I'd like the output:
id | raw_input | amount |
---|---|---|
a | x | 0.4 |
a | y | 0.6 |
b | j | 0.25 |
b | k | 0.75 |
c | x | 0.34285714 |
c | y | 0.51428571 |
c | j | 0.03571429 |
c | k | 0.10714286 |
- I haven't added mixture d here as it's quite tricky to calculate at this stage
where the values are calculated as such:
id | raw_input | amount |
---|---|---|
a | x | 4/(4 6) |
a | y | 6/(4 6) |
b | j | 1/4 |
b | k | 3/4 |
c | x | 0.4*(6/(6 1)) |
c | y | 0.6*(6/(6 1)) |
c | j | 0.25*(1/(6 1)) |
c | k | 0.75*(1/(6 1)) |
My method for attempting this was to join a aggregate total onto the tables in the CTE, then divide the masses by this as such:
WITH cte AS (
SELECT id, base_input, mass_fraction FROM
(SELECT E.id, E.base_input, E.amount/f.total_mass AS mass_fraction
FROM mix_table E
JOIN (SELECT id, SUM(amount) as total_mass
FROM mix_table
GROUP BY id
) AS root_totals
ON root_totals.id = E.id
WHERE E.base_input IS NOT NULL) AS r
UNION ALL
SELECT b.id, base_input, mass_fraction/totals.total_mass FROM
(SELECT F.id, cte.base_input, cte.amount/branch_totals.total_mass AS mass_fraction
FROM mix_table F
JOIN cte on F.parent_input = cte.id) as b
JOIN (SELECT id, SUM(amount) as total_mass
FROM mix_table
GROUP BY id
) AS branch_totals
ON branch_totals.id = totals.id
)
select * from cte
Running it without the totals joined onto the CTE gets most of the way there, just the individual components of the mixture group C are not scaled by their respective fractions.
It seems like a CTE with an aggregate function is exactly what I want to do, just the error raised by SQL server prevents me doing it.
There must be a way around this, I'm sure I'm not the first person to want to do this.
Edit: I'd like to clarify that I'm aiming to do this for more than one level, where I can expand the solution to account for an indeterminate level of nested parent/children
CodePudding user response:
As you only have 2 levels, it would seem easier to do the UNION ALL
outside of the CTE, rather than within it:
SELECT *
INTO dbo.YourTable
FROM (VALUES('a','x',NULL,4),
('a','y',NULL,6),
('b','j',NULL,1),
('b','k',NULL,3),
('c',NULL,'a',6),
('c',NULL,'b',1))V(id,raw_input,parent_input, amount);
GO
WITH CTE AS(
SELECT id,
raw_input,
amount,
parent_input,
(amount*1.) / SUM(Amount) OVER (PARTITION BY id) AS amountperc
FROM dbo.YourTable YT)
SELECT id,
raw_input,
--amount,
amountperc AS amount
FROM CTE
WHERE parent_input IS NULL
UNION ALL
SELECT C.id,
P.raw_input,
C.amountperc * P.amountperc
FROM CTE P
JOIN CTE C ON P.id = C.parent_input;
GO
DROP TABLE dbo.YourTable;
CodePudding user response:
I managed to solve my own quandry, and for future viewers I think this only works because it doesn't require the aggregation to be run more than once, I can do the aggregation calculation before the CTE, then use it within the CTE:
-- I first define the total aggregation here
WITH totals AS (SELECT id, SUM(amount) as total_mass
FROM test_mix
GROUP BY id),
cte AS (
SELECT id, raw_input, amount FROM
(SELECT E.id, E.raw_input, E.amount/totals.total_mass AS amount
FROM test_mix E
JOIN totals
ON totals.id = E.id
WHERE E.raw_input IS NOT NULL) AS r
UNION ALL
SELECT b.id, raw_input, child_q*(parent_q/totals.total_mass) FROM
(SELECT F.id, cte.raw_input, cte.amount AS parent_q, F.amount AS child_q
FROM test_mix F
JOIN cte on F.parent_input = cte.id) as b
JOIN totals -- then use it here
ON totals.id = b.id
)
SELECT id,raw_input, SUM(amount)
FROM cte
GROUP BY id,raw_input