Home > Mobile >  Temporarily storing tables with variable amounts of columns
Temporarily storing tables with variable amounts of columns

Time:12-24

Suppose I have a hierarchical structure like this.

Father ID Child ID
1 2
1 3
1 7
3 4
3 5
3 6

In my case, those are production orders, that demand from (father) or supply to (child) each other. I need to bring those orders to a structure like this:

Master Father Hierarchy Child
1 001.001 2
1 001.002 3
1 001.002.001 4
1 001.002.002 5
1 001.002.003 6
1 001.003 7

Afterwards I have to join the components to the child orders, that were used to create the master father. The target is, to trace the origin of the components, that were used to create a certain product.

The big problem is, that I can't predict, how deep those tree structures will become in the future. But I want to create a report, that won't break, when they get deeper.

So far I counted the depth manually. If the Master Father is 0 then I have 2 Levels in my example case. The (pseudo) code for the transformation would look like this:

Select 
    L.Father_ID as Level0
    , '001' as Hierarchy0
    , L.Child_ID as Level1
    , Dense_Rank() over (partition by L.Father_ID) as Hierarchy1
    , R1.Child_ID as Level2
    , Dense_Rank() over (partition by L.Father_ID, L.Child_ID) as Hierarchy2
Into #exploded_table
From Table as L
Left Join Table as R1
on L.Child_ID  = R1.Father_ID 

Select distinct
    Level0 as Master_Father
    , Concat(Hierarchy0,'.',format(Hirarchy1, '000')) as Hierarchy
    , Level1 as Child
From #exploded_table
Union all
Select distinct
    Level0 as Master_Father
    , Concat(Hierarchy0,'.',format(Hirarchy1, '000'),'.',format(Hirarchy2, '000')) as Hierarchy
    , Level2 as Child
From #exploded_table

I have two problems with this code.

  1. It gets longer with every level
  2. It will break if more levels will be added in the future

Therefore I started to write dynamic code. It first counts the depth of the deepest tree and then it dynamically creates the code with as much levels as are needed.

When you execute this code the amount of columns is (as far as I understand it) called "Non-Deterministic". And MS SQL hates "Non-Deterministic" things. To be allowed to store this as a temp table I have to create a temp table outside of the scope of the EXEC sp_executesql function. And then I have to dynamically modify the columns of this table to fit exactly to the result of the dynamic SQL statement that I pass to the Exec function. I won't use global temp tables since this will lead to chaos when multiple reports have the same names for their temp tables.

The above described method is complicated and unreadable! I can do it, but it feels like the opposite of good practice.

So I want to ask the community whether this can't be done in a simpler way. I came from a company that used PySpark where this was easily doable and now I am here and all I have is SQL Server and the SQL Server Reporting Services.

CodePudding user response:

I believe you are mostly along the right track. However, as you have defined columns (e.g., Master_Father_ID, Hierarchy, and Child_ID) you don't need dynamic SQL.

Rather than a recursive CTE, I've written the below as a WHILE loop to help understand what's happening. It's easy enough to convert to a recusive CTE but I think (especially to many people familiar with other programming languages) that a WHILE loop is more familiar.

The logic of the below is as follows

  1. Find any 'Master fathers' and insert them into the working table
  2. For each value inserted in the last step, find their direct children and insert them into the working table
  3. Repeat step 2 above (e.g., round 3 finds children of children; round 4 finds children of children of children) until no more data is inserted

Then the final report needs to simply ignore the initial Master_Father rows inserted (e.g., what would be 001, 002 etc in your table above).

The SQL below can also be found in this db<>fiddle - noting that I have added extra data for demonstration purposes.

Initial data

CREATE TABLE #exploded_table (
        Father_ID int, 
        Child_ID int, 
        PRIMARY KEY (Father_ID, Child_ID)
        );

INSERT INTO #exploded_table (Father_ID, Child_ID) VALUES
(1,  2),  (1,  3),  (1,  7),  (3,  4),  (3,  5),  (3,  6);

Now for the working table - I've called it #summary_table. Current_ID represents the current Child_ID for that row (e.g., at the deepest level of the hierarchy). insert_round is the round number (loop iteration) that the row was inserted.

I also have two variables: @round that indicates which round/cycle we're up to, and @n indicating the number of rows inserted in the current round.

CREATE TABLE #summary_table (
            Master_Father_ID int, 
            Current_ID int, 
            Hierarchy nvarchar(500), 
            insert_round int
            );

DECLARE @round int = 1;
DECLARE @n int = 0;

Step 1: Insert master fathers

Master fathers are those that are not themselves children e.g., Father_Id 1 is not a child, so it is included; Father_ID 3 is a child, therefore isn't a master father.

Note that I'm using RIGHT(N'000' LTRIM(STR(int value)), 3) to convert any integer number to 3-digit format as a string.

INSERT INTO #summary_table (Master_Father_ID, Current_ID, Hierarchy, insert_round)
    SELECT  DISTINCT Father_ID, 
            Father_ID, 
            RIGHT(N'000'   LTRIM(STR(DENSE_RANK() OVER (ORDER BY Father_ID))), 3), 
            @round
    FROM    #exploded_table
    WHERE   Father_ID NOT IN (SELECT Child_ID FROM #exploded_table);

SET @n = @@ROWCOUNT;

Step 2: Find any children of the last round and insert their data into the working table

(Also Step 3: Repeat until no more is happening)

Taking the data from the last round of insertions (identified by #summary_table.insert_round), identify any children and insert their data into the working table.

Keep doing this until you don't insert any more rows.

WHILE @n > 0
    BEGIN

    INSERT INTO #summary_table (Master_Father_ID, Current_ID, Hierarchy, insert_round)
        SELECT  DISTINCT
                #summary_table.Master_Father_ID, 
                #exploded_table.Child_ID, 
                #summary_table.Hierarchy   N'.'   RIGHT(N'000'   LTRIM(STR(DENSE_RANK() OVER (PARTITION BY #summary_table.Master_Father_ID, #summary_table.Current_ID ORDER BY #exploded_table.Child_ID))), 3),
                @round   1
        FROM    #summary_table
                INNER JOIN #exploded_table ON #summary_table.Current_ID = #exploded_table.Father_ID
        WHERE   #summary_table.insert_round = @round;

    SET @n = @@ROWCOUNT;

    SET @round  = 1;
    
    END;

Then the final report

SELECT      Master_Father_ID, Hierarchy, Current_ID AS Child_ID
    FROM    #summary_table 
    WHERE   insert_round > 1
    ORDER BY Hierarchy;

Data is as follows

Master_Father_ID    Hierarchy       Child_ID
1                   001.001         2
1                   001.002         3
1                   001.002.001     4
1                   001.002.002     5
1                   001.002.003     6
1                   001.003         7

Note - this assumes your original data is set up properly and has no loops/etc (e.g., if Kyle Reese was John Connor's child as well as his father... err sorry, 'spolier alert'). If you have those, you need to add extra checks - and those checks will depend on how you actually want to deal with those loops.

CodePudding user response:

You should be able to do this with Recursive CTE.

I have also included an example of a build in SQL Server hierarchy data type (HIERARCHYID).

See comments in code for details

-- Test data based on your example
CREATE TABLE #temp( FatherID INT, ChildID INT )
INSERT INTO #temp
VALUES ( 1, 2 ), ( 1, 3 ), ( 1, 7 ), ( 3, 4 ), ( 3, 5 ), ( 3, 6 ),
-- I have added a 3rd level
( 6, 8 ), ( 6, 9 )
-- SELECT * FROM #temp;

-- Recursive CTE to get Children of Children and construct Hierarchy
;WITH Hierarchies( MasterFatherID, FatherID, ChildID, Hierarchy, Hierarchy2 )
AS(
    -- This is the "anchor" part
    SELECT FatherID AS MasterFatherID, FatherID, ChildID,
        FORMAT( DENSE_RANK() OVER( PARTITION BY FatherID ORDER BY ChildID ), '000' ) AS Hierarchy,
        -- This is an example of SQL Server built in Hierarchy data type
        CAST( '/'   FORMAT( DENSE_RANK() OVER( PARTITION BY FatherID ORDER BY ChildID ), '0' )   '/' AS HIERARCHYID ) AS Hierarchy2
    FROM #temp
    UNION ALL
    SELECT
        -- Top level Parent
        Hierarchies.MasterFatherID,
        -- Current Parent
        t.FatherID,
        -- Current Child
        t.ChildID,
        Hierarchies.Hierarchy   '.'   FORMAT( DENSE_RANK() OVER( PARTITION BY t.FatherID ORDER BY t.ChildID ), '000' ) AS Hierarchy,
        -- This is an example of SQL Server built in Hierarchy data type
        HIERARCHYID::Parse( Hierarchies.Hierarchy2.ToString()   FORMAT( DENSE_RANK() OVER( PARTITION BY t.FatherID ORDER BY t.ChildID ), '0' )   '/' ) AS Hierarchy2
    FROM Hierarchies
        INNER JOIN #temp AS t ON Hierarchies.ChildID = t.FatherID
)
SELECT MasterFatherID, FatherID, ChildID, Hierarchy, Hierarchy2.ToString() AS Hierarchy2
FROM Hierarchies AS a
-- This will exclude Children (e.g. [3, 4]) from being included as their own chain
WHERE NOT EXISTS( SELECT * FROM Hierarchies AS Children WHERE Children.ChildID = a.MasterFatherID )
ORDER BY MasterFatherID

Results

MasterFatherID FatherID    ChildID     Hierarchy      Hierarchy2
-------------- ----------- ----------- -------------- ----------
1              1           2           001            /1/
1              1           3           002            /2/
1              1           7           003            /3/
1              3           4           002.001        /2/1/
1              3           5           002.002        /2/2/
1              3           6           002.003        /2/3/
1              6           8           002.003.001    /2/3/1/
1              6           9           002.003.002    /2/3/2/
  • Related