Home > Net >  How to get the sum of each branch in a Tree structure?
How to get the sum of each branch in a Tree structure?

Time:08-16

I have the next Tree Structure :

enter image description here

This structure is represented in the database with the next Table:

--------------------------------Employees-------------------------------------------

-------------------------------------------------------------------------------------
|        Employee_name         |           Global_Team             |       Team      |
-------------------------------------------------------------------------------------
|name_of_head_of_engineering   |           -                       | Engineering     |
-------------------------------------------------------------------------------------
|name_1_T1                     |         Engineering               | Team_1          |
-------------------------------------------------------------------------------------
|name_2_T1                     |         Engineering               | Team_1          |
-------------------------------------------------------------------------------------
|name_3_T1                     |         Engineering               | Team_1          |
-------------------------------------------------------------------------------------
|name_4_T1                     |         Engineering               | Team_1          |
-------------------------------------------------------------------------------------
|name_5_T1                     |         Engineering               | Team_1          |
-------------------------------------------------------------------------------------
|name_1_T2_1                   |         Team_2                    | 2_Team_1        |
-------------------------------------------------------------------------------------
|name_2_T2_1                   |         Team_2                    | 2_Team_1        |
-------------------------------------------------------------------------------------
|name_1_T2_2                   |         Team_2                    | 2_Team_2        |
------------------------------------------------------------------------------------- 
|name_2_T2_2                   |         Team_2                    | 2_Team_2        |
------------------------------------------------------------------------------------- 
|name_3_T2_2                   |         Team_2                    | 2_Team_2        |
------------------------------------------------------------------------------------- 

I need the number of employee per each team, result should looks like that:

--------------------------------------------------------------------
|        Team                |           count_employee             |
-------------------------------------------------------------------- 
| Engineering                |               10                     |
--------------------------------------------------------------------
| Team_1                     |               5                      |
--------------------------------------------------------------------
| Team_2                     |               5                      |
--------------------------------------------------------------------
| 2_Team_1                   |               2                      |
-------------------------------------------------------------------- 
| 2_Team_2                   |               3                      |
--------------------------------------------------------------------

I have Tried with:

Select  Team, count(employee_name),  from Employees Group by Team

I got those results :

--------------------------------------------------------------------
|        Team                |           count_employee             |
-------------------------------------------------------------------- 
| Engineering                |               1                      |
--------------------------------------------------------------------
| Team_1                     |               5                      |
--------------------------------------------------------------------
| 2_Team_1                   |               2                      |
-------------------------------------------------------------------- 
| 2_Team_2                   |               3                      |
-------------------------------------------------------------------- 

CodePudding user response:

Have a look at the extension ltree, https://www.postgresql.org/docs/current/ltree.html#id-1.11.7.30.8 so setting the correct path will enable following querys:

select count(*) as total from Employees where 'Engineering' @> path;
select count(*) as team1 from Employees where '*.Team_1' ~ path;
select count(*) as team2 from Employees where '*.Team_2.*' ~ path;
select count(*) as team2_1 from Employees where '*.2_Team_1' ~ path;
select count(*) as team2_1 from Employees where '*.2_Team_2' ~ path;

playground: https://www.db-fiddle.com/f/qErb2wGtrYMUt1cBiZ5NAx/0

CodePudding user response:

The tree structure that you've shown in the image is not properly mapped to the table. For example there's no way to know if Team_2 belongs to Engineering, so it's not possible to achieve the result you're looking for with the current table structure. I suggest breaking it into 2 tables: teams and employees

teams table would have the following data:

id   |    name     |  parent_team_id
-------------------------------------
1    | Engineering |      null
2    |   Team_1    |       1
3    |   Team_2    |       1
4    |  2_Team_1   |       3
5    |  2_Team_2   |       3

And employees table would look like

id   |        name         |  team_id
----------------------------------------
1    | head_of_engineering |      1
2    |      name_1_T1      |      2
3    |      name_2_T1      |      2
.    |          .          |      .
.    |          .          |      . 
10   |     name_3_T2_2     |      5  

This allows you to track back until you reach head of the organization i.e the employee with team_id whose parent_team_id is null

In order to get the total count at each level though, we'll have to use a recursive query as shown below

with recursive local_team_counts as (
  select t.name, t.id, t.parent_team_id, count(e.id) as size 
  from teams t
  left join employees e on t.id = e.team_id
  group by 1,2,3
  order by t.id
), total_counts as (
  select parent_team_id, id, size from local_team_counts
  union all
  select tc.parent_team_id, ltc.id, ltc.size from local_team_counts ltc 
  join total_counts tc on tc.id = ltc.parent_team_id
) select ltc.name, ltc.size   coalesce(tc.size, 0) size from local_team_counts ltc
left join (
    select parent_team_id, sum(size) as size
    from total_counts 
    group by 1 
) tc on tc.parent_team_id = ltc.id

Would highly recommend you to go through the shared link for recursive queries. Once that is clear, it'll make more sense. The idea here is that we first compute the local employee count for each team i.e the employees whose team_id matches the current. This is done in the local_team_counts cte which we'll use in the recursive cte to get the total counts. Note even though it says recursive at the top, local_team_counts is not recursive. The recursive property is only applicable when we use union as far as I understand (ref)

total_counts is a recursive cte. I'll try my best to explain but would highly recommend playing around and breaking the whole query and/or modifying the select statement at the end to understand the output at each level.

  1. First we simply fetch the results of local_team_counts as our base input. This means that the recursive query (the one after union all) will run for each of the rows in local_team_counts. In the given example there are 5 rows (5 teams)
  2. The recursive statement looks for the children for each of the base rows. Notice that going down the tree, we maintain the same parent_team_id as the base using tc.parent_team_id. This will allow us to sum the sizes based on parent.
  3. There's a catch however; the above strategy finds the total number of children of children but not the immediate children. For instance Team_1 wouldn't even appear in the result set of total_counts cte because it doesn't have any children of children. So we need to combine the sizes of local and total counts as the former contains the size of immediate children. That is exactly what we've done in the final select.

The query will give you the desired result. Though Engineering team would have 11 count instead of 10 because it'll count the head of engineering as well. If you don't want that, the fix is pretty simple. In the local_team_counts cte, add a check to ignore the counting the row where parent_team_id is null by modifying the left join.

left join employees e on t.id = e.team_id and t.parent_team_id is not null

Hope this helps. Check out db-fiddle to play around and understand more

  • Related