I need to query some parent-child tree data in a single table. The structure of this table is defined by a customers database and cannot be changed.
The result we try to achieve is that we get one result row per table row with the rows data itself plus the path from to to row itself in a display friendly way.
Additionally the Top Most Parent that was used in the hierarchy (see row 8 as an example for an orphaned row, parent does not exist anymore, which can totally be the case at this customers database, as this will not render the data invalid in his use cases) and the most restrictive validity dates in the hierarchy (so the max ValidFrom and min ValidTo, NULL represents no restriction in Validity).
The table is defined like this:
CREATE TABLE [dbo].[HiTest]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[ParentId] [int] NULL,
[Name] [varchar](100) NOT NULL,
[ValidFrom] [datetime] NULL,
[ValidTo] [datetime] NULL,
CONSTRAINT [PK_HiTest]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
And the test data:
SET IDENTITY_INSERT [dbo].[HiTest] ON
GO
INSERT [dbo].[HiTest] ([Id], [ParentId], [Name], [ValidFrom], [ValidTo])
VALUES (1, NULL, N'First Level', NULL, NULL),
(2, 1, N'Second Level 1', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-12-31T00:00:00.000' AS DateTime)),
(3, 1, N'Second Level 2', NULL, NULL),
(4, 2, N'Third Level 1', CAST(N'2022-02-01T00:00:00.000' AS DateTime), CAST(N'2022-12-31T00:00:00.000' AS DateTime)),
(5, 3, N'Third Level 2', CAST(N'2022-03-01T00:00:00.000' AS DateTime), CAST(N'2022-10-31T00:00:00.000' AS DateTime)),
(6, 4, N'Fourth Level 1a', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-09-30T00:00:00.000' AS DateTime)),
(6, 4, N'Fourth Level 1a', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-09-30T00:00:00.000' AS DateTime)),
(8, 23, N'Orphaned Level', NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[HiTest] OFF
And the expected result should be:
Can you hint me to the best solution for this scenario (a performant way, there are round about 120000 data rows in this table). At moment, I am playing around with an CTE solution, but I am not getting it to work.
Thank you in advance. SR
CodePudding user response:
A CTE is going to be the simplest option, given that you have multiple cascading rules that are not simple COALESCE scenarios, that is it not simple nullability checking. Judging by your output, the Agg Valid...
range can only be more restrictive than the parent range, so if the parent is valid from April to August of the same year, then the child cannot be valid before April, or after August. It can however be made valid from May to July as this is a more restrictive range than the parent.
Outside of the standard Hierarchy CTE logic the following conditions need to be tracked:
- Keep a reference to the original parent record's
[ParentId]
, pass this through unchanged. [Path]
is either the current[Id]
if[ParentId]
is NULL, if there is no existing parent record but there is a[ParentId]
value, then construct the path from the[ParentId]\[Id]
. Otherwise append the parent[Path]
with a '\' and the child[Id]
[ValidFrom]
COALESCE from the Parent (first _non-null) value- can be overridden if the child is greater than the parent
[ValidTo]
COALESCE from the Parent (first _non-null) value- can be overridden if the child is less than the parent
At the end of the Recursive CTE, we join back onto the main table to pickup the additional fields to display with the current record:
WITH Hierarchy AS (
SELECT [Id], [ParentId] -- < standard Hierarchy requirements
, [ParentId] AS [Master]
, [Path] = ISNULL(CAST([ParentId] AS VARCHAR(max)) '\' CAST([Id] AS VARCHAR(max)), CAST([Id] AS VARCHAR(max)))
, [ValidFrom], [ValidTo]
FROM [HiTest]
WHERE [ParentId] IS NULL OR NOT EXISTS (SELECT 1 FROM [HiTest] lookup WHERE lookup.[id] = [HiTest].[ParentId])
UNION ALL
SELECT [child].[Id], [child].[ParentId]
, [parent].[Master] --> don't change this value
, CONCAT([parent].[Path], '\', CAST([child].[Id] AS VARCHAR(max))) AS [Path]
, CASE
WHEN [parent].[ValidFrom] IS NULL THEN [child].[ValidFrom]
WHEN [child].[ValidFrom] > [parent].[ValidFrom] THEN [child].[ValidFrom]
ELSE [parent].[ValidFrom]
END
, CASE
WHEN [parent].[ValidTo] IS NULL THEN [child].[ValidTo]
WHEN [child].[ValidTo] < [parent].[ValidTo] THEN [child].[ValidTo]
ELSE [parent].[ValidTo]
END
FROM [HiTest] [child]
INNER JOIN Hierarchy parent ON [child].[ParentId] = [parent].[Id]
)
SELECT [HiTest].[Id], [HiTest].[ParentId], [Name], [HiTest].[ValidFrom], [HiTest].[ValidTo], [Path]
, [Hierarchy].[ValidFrom] AS [Agg. ValidFrom]
, [Hierarchy].[ValidTo] AS [Agg. ValidTo]
, [Master] AS [Top Most Parent Id]
FROM [Hierarchy]
INNER JOIN [HiTest] ON [Hierarchy].[Id] = [HiTest].[Id]
ORDER BY [Id]
View this fiddle here: https://dbfiddle.uk/5M6QLC8w
Usually we wouldn't sort a list like this by the child item Id (leaf), it would be more common to ORDER BY [Path]
(branch) to visualize the Hierarchy:
Id | ParentId | Name | ValidFrom | ValidTo | Path | Agg. ValidFrom | Agg. ValidTo | Top Most Parent Id |
---|---|---|---|---|---|---|---|---|
1 | null | First Level | null | null | 1 | null | null | null |
2 | 1 | Second Level 1 | 2022-01-01 | 2022-12-31 | 1\2 | 2022-01-01 | 2022-12-31 | null |
4 | 2 | Third Level 1 | 2022-02-01 | 2022-12-31 | 1\2\4 | 2022-02-01 | 2022-12-31 | null |
6 | 4 | Fourth Level 1a | 2022-01-01 | 2022-09-30 | 1\2\4\6 | 2022-02-01 | 2022-09-30 | null |
7 | 4 | Fourth Level 1b | null | null | 1\2\4\7 | 2022-02-01 | 2022-12-31 | null |
3 | 1 | Second Level 2 | null | null | 1\3 | null | null | null |
5 | 3 | Third Level 2 | 2022-03-01 | 2022-10-31 | 1\3\5 | 2022-03-01 | 2022-10-31 | null |
8 | 23 | Orphaned Level | 2011-01-01 | 2012-12-31 | 23\8 | 2011-01-01 | 2012-12-31 | 23 |