Home > other >  How to get sum of the children nodes on every level in an hierarchical tree?
How to get sum of the children nodes on every level in an hierarchical tree?

Time:01-05

I have table "A" with hierarchical data like this:

create table dictionary_a
(
  id number not null,
  parent_id number,
  c_name varchar2(50),
  constraint pk_dictionary primary key (id),
  constraint fk_dictionary foreign key (parent_id) references dictionary_a (id)
);
id parent_id c_name
1            name1
2  1         name2
3  1         name3
4  3         name4
5  3         name5
6  2         name6
7  6         name7
...

(actual hierarchical data table has 7 levels, but this is subject to change)

And table "B" with data i need to sum up:

create table numeric_data
(
  dict_id number not null,
  n_sum number,
  constraint fk_numeric_data foreign key (dict_id) references dictionary_a (id)
);
dict_id n_sum
1       36.0
2       20.0
3       16.0
4       10.5
5       5.5
7       20.0
...

Note that higher level nodes also have sums tied to them.

I need to get the sum of all child nodes at each level and compare them with the actual data from the column n_sum (this column is populated by users and my job is to find all the inconsistencies):

dict_id n_sum actual_sum c_name
1       36.0  36.0       name1
2       20.0  20.0       name2
3       16.0  16.0       name3
4       10.5  10.5       name4
5       5.5   5.5        name5
6             20.0       name6
7       20.0  20.0       name7

I searched Internet, but what I could find was very closely related to specific problems, no general solution.

Test data:

insert into dictionary_a
select level,null,dbms_random.string('L',10) from dual
connect by level < 101;

update dictionary_a a set
a.parent_id = 
  case
    when a.id between 6 and 20 then trunc(dbms_random.value(1,6))
    when a.id between 21 and 40 then trunc(dbms_random.value(6,21))
    when a.id > 40 then trunc(dbms_random.value(21,41))
  end
where a.id > 5;

insert into numeric_data
select level,trunc(dbms_random.value(1,21),2) from dual
connect by level < 101;

commit;

I am working on Oracle 18c.

CodePudding user response:

You can use an outer join between your tables:

select da.id, da.parent_id, da.c_name, coalesce(nd.n_sum, 0) as n_sum
from dictionary_a da
left join numeric_data nd on nd.dict_id = da.id;

and then use that as a source for a hierarchical query, tracking the root ID, name and quantity:

select id,
  parent_id,
  n_sum,
  connect_by_root id as root_id,
  connect_by_root n_sum as root_n_sum,
  connect_by_root c_name as root_c_name,
  connect_by_isleaf as isleaf
from (
  select da.id, da.parent_id, da.c_name, coalesce(nd.n_sum, 0) as n_sum
  from dictionary_a da
  left join numeric_data nd on nd.dict_id = da.id
)
connect by parent_id = prior id;

and then sum the leaf nodes to get the values you seem to want:

with cte as (
  select id,
    parent_id,
    n_sum,
    connect_by_root id as root_id,
    connect_by_root n_sum as root_n_sum,
    connect_by_root c_name as root_c_name,
    connect_by_isleaf as isleaf
  from (
    select da.id, da.parent_id, da.c_name, coalesce(nd.n_sum, 0) as n_sum
    from dictionary_a da
    left join numeric_data nd on nd.dict_id = da.id
  )
  connect by parent_id = prior id
)
select root_id as dict_id,
  root_n_sum as n_sum,
  sum(n_sum) as actual_sum,
  root_c_name as c_name
from cte
where isleaf = 1
group by root_id, root_n_sum, root_c_name
order by root_id;

Which with your explicit sample data gives:

DICT_ID N_SUM ACTUAL_SUM C_NAME
1 36 36 name1
2 20 20 name2
3 16 16 name3
4 10.5 10.5 name4
5 5.5 5.5 name5
6 0 20 name6
7 20 20 name7

I've included coalesce(nv.n_sum, 0) so the 'original' n_sum value for ID 6 shows as zero rather than null, which your example doesn't have; it will show null if you just remove the coalesce, but including it means you can add a simple

having root_n_sum != sum(n_sum)

clause to only see the discrepancies. That clause just becomes more complicated if you leave nulls alone, but it might be preferable:

with cte as (
  select id,
    parent_id,
    n_sum,
    connect_by_root id as root_id,
    connect_by_root n_sum as root_n_sum,
    connect_by_root c_name as root_c_name,
    connect_by_isleaf as isleaf
  from (
    select da.id, da.parent_id, da.c_name, nd.n_sum
    from dictionary_a da
    left join numeric_data nd on nd.dict_id = da.id
  )
  connect by parent_id = prior id
)
select root_id as dict_id,
  root_n_sum as n_sum,
  sum(n_sum) as actual_sum,
  root_c_name as c_name
from cte
where isleaf = 1
group by root_id, root_n_sum, root_c_name
having (root_n_sum is null and sum(n_sum) is not null)
or (root_n_sum is not null and sum(n_sum) is null)
or root_n_sum != sum(n_sum)
order by root_id;

gives just:

DICT_ID N_SUM ACTUAL_SUM C_NAME
6 null 20 name6

db<>fiddle

CodePudding user response:

It is unclear what your expected output is since you are generating random data; however, to solve the problem:

I need to get the sum of all child nodes at each level

You can generate all the child nodes and use CONNECT_BY_ROOT to keep a record of the root id of the hierarchy; then you can sum the values to get the total:

SELECT root_id,
       MAX(c_name),
       SUM(n_sum) AS total
FROM   (
  SELECT CONNECT_BY_ROOT(id) AS root_id,
         CONNECT_BY_ROOT(c_name) AS c_name,
         n.n_sum
  FROM   dictionary_a d
         INNER JOIN numeric_data n
         ON (d.id = n.dict_id)
  CONNECT BY PRIOR d.id = d.parent_id
)
GROUP BY root_id
ORDER BY root_id

db<>fiddle here


What you appear to want is not to sum all the child nodes but sum all the leaf nodes:

SELECT root_id,
       MAX(c_name) AS c_name,
       MAX(root_sum) As n_sum,
       SUM(n_sum) AS total
FROM   (
  SELECT CONNECT_BY_ROOT id AS root_id,
         CONNECT_BY_ROOT c_name AS c_name,
         CONNECT_BY_ROOT n_sum AS root_sum,
         d.id,
         n.n_sum
  FROM   dictionary_a d
         LEFT OUTER JOIN numeric_data n
         ON (d.id = n.dict_id)
  WHERE  CONNECT_BY_ISLEAF = 1
  CONNECT BY PRIOR d.id = d.parent_id
)
GROUP BY root_id
ORDER BY root_id

Which, for your (non-random) sample data, outputs:

ROOT_ID C_NAME N_SUM TOTAL
1 name1 36 36
2 name2 20 20
3 name3 16 16
4 name4 10.5 10.5
5 name5 5.5 5.5
6 name6 null 20
7 name7 20 20

db<>fiddle here

CodePudding user response:

I need to get the sum of all child nodes at each level and compare them with the actual data from the column n_sum

So first outer join to your number table twice, once for id and once for parent_id.

The sum of all child nodes is as easy as an analytic SUM over parent_id.

Than simple select all rows where the child_sum does not match the node sum.

WITH dt AS (
select da.id, da.parent_id, da.c_name, 
sum(nd.n_sum) OVER (partition by da.parent_id) as child_sum, 
ndp.n_sum as id_sum
from dictionary_a da
left join numeric_data nd on nd.dict_id = da.id
left join numeric_data ndp on ndp.dict_id = da.parent_id
WHERE parent_id IS NOT NULL)
SELECT * FROM dt
WHERE nvl(child_sum,0) != nvl(id_sum,0)

As expected you get two problems

  • for parent 2 the child sum is null but the node sum is 20 and
  • for parent 6 the child sum is 20 but the node sum is null.
  •  Tags:  
  • Related