Home > Software design >  LAG All Previous Rows - Parent Child Database Find All Total Children Count From Each Parent
LAG All Previous Rows - Parent Child Database Find All Total Children Count From Each Parent

Time:08-04

I'm trying to obtain all children count from each parent (where subchildren counts).

This is the sample database (MSSQL).

INSERT INTO NODES VALUES(NULL, 1);
INSERT INTO NODES VALUES(NULL, 2);
INSERT INTO NODES VALUES(2, 3);
INSERT INTO NODES VALUES(1, 4);
INSERT INTO NODES VALUES(1, 5);
INSERT INTO NODES VALUES(3, 6);
INSERT INTO NODES VALUES(NULL, 7);
INSERT INTO NODES VALUES(NULL, 8);
INSERT INTO NODES VALUES(8, 9);
INSERT INTO NODES VALUES(7, 10);
INSERT INTO NODES VALUES(9, 11);
INSERT INTO NODES VALUES(11, 12);
INSERT INTO NODES VALUES(10, 13);
INSERT INTO NODES VALUES(10, 14);
INSERT INTO NODES VALUES(4, 15);

Where the hierarchy is;

- 1
   - 4
      - 15
   - 5
- 2
   - 3
      - 6
- 7
   - 10
      - 13
      - 14
- 8
   - 9
      - 11
         - 12

And the desired result is:

id Children Count
1 3
4 1
5 0
2 2
3 1
6 0
7 3
10 2
13 1
14 0
8 3
9 2
11 1
12 0

Every time I make a strategy to formulate a query, I get to the point where I must iterate the table formed by the query at running time.

If I go deep out grouping the results (with the parentid) I could generate a count of only the children (not the subchildren), so that we can add up to the root. But clearly I would need to iterate the table that I have been forming in the query, I don't know if it would be correct to say recursively.

To express myself better I will show it with the part I have reached and what I want to do;

WITH tree AS
(
    SELECT n1.parentid, n1.id, 1 AS level
    FROM NODES AS n1
    WHERE n1.parentid IS NULL
    UNION ALL
    SELECT n2.parentid, n2.id, level   1 AS level
    FROM NODES AS n2
    INNER JOIN tree ON n2.parentid = tree.id
), levels AS 
(
  SELECT * 
  FROM tree
)
SELECT parentid, id, (COUNT(*) OVER(PARTITION BY parentid ORDER BY parentid)) AS childrencountofparentid, 
    ROW_NUMBER() OVER(ORDER BY parentid DESC) AS rownumber
FROM levels

Where the output is:

parentid id childrencountofparentid rownumber
11 12 1 1
10 13 2 2
10 14 2 3
9 11 1 4
8 9 1 5
7 10 1 6
4 15 1 7
3 6 1 8
2 3 1 9
1 4 2 10
1 5 2 11
null 1 4 12
null 2 4 13
null 7 4 14
null 8 4 15

I want to do this: Actual query Full Image

I want to use the results of the previous rows, similar to lag but i must iterate all previous rows.

CodePudding user response:

Here is solution that works if you don't care for id order. Unfortunately, I don't know how to make it ordered as you did in desired output. If nothing I hope it helps you come to solution as data is good.

WITH cte as (
 SELECT p.Id as 'PID', c.Id as 'CID' FROM nodes p left join nodes c on p.id = c.ParentId
 UNION ALL
 SELECT c.PID,p.ID FROM cte c JOIN nodes p ON c.CID=p.ParentId
)
SELECT PID as id, count(*) as 'Children Count' FROM cte where CID IS NOT NULL GROUP BY PID 
UNION ALL
SELECT PID, 0 FROM cte WHERE CID IS NULL GROUP BY PID
ORDER BY PID ASC

CodePudding user response:

I'll solve this with the hierarchyid datatype. First, a recursive CTE to calculate that value for each row in your dataset.

with cte as (
    select ID, ParentID, 
       h = cast(concat('/', ID, '/') as varchar(100))
    from NODES
    where ParentID is null

    union all

    select child.ID, child.ParentID,
        h = cast(concat(Parent.h, child.ID, '/') as varchar(100))
    from NODES as child
    join cte as Parent
        on child.ParentID = Parent.ID
)
select ID, ParentID,
    h = cast(h as hierarchyid)
into #t
from cte;

The only "tricky" thing here is I'm building a string as I'm traversing the hierarchy that can be converted to the hierarchyid datatype.

From there, it's fairly easy to count children with what amounts to a self-join.

select ID, h.ToString(), count(child.a)
from #t as parent
outer apply (
    select a = 1
    from #t as child
    where child.h.IsDescendantOf(parent.h) = 1
        and child.ID <> Parent.ID
) as child
group by parent.ID, h
order by h;

Note - I'm only including the h column in the result set to a) show the path back to parent and b) to provide ordering. If you need neither of those, it's not necessary to include. One note - in your desired results, you have 13 as having a child count of 1. I don't see anywhere in the data (or in your helpfully provided visualization of the hierarchy) that 13 has any children. I do see that it has a sibling in ID 14 - if that needs to be counted, the approach needs to change slightly.

One other thing to note here - if you can maintain the hierarchyid column in the source data (which isn't hard to do, by the way), there's no need for a recursive CTE at all. I like a mix of the two approaches. Which is to say keep the notion of ParentID as a column but use the hierarchyid in calculations. If the hierarchyid gets "corrupted" (which is possible given that it's really a cached/computed value that's derived from base data and application logic), it can be recalculated from the ID/ParentID data.

  • Related