I have a fairly basic SQL query which runs in 1 second without the Data_1 field which is performing a STUFF() and using RN for filter and order. With the Data_1 field in the query the execution goes from 1 second to 25 seconds. If I remove the RN filter and ORDER BY from the STUFF function of Data_1 then it goes back to executing in 1 second with both Data_1 and Data_2 in the query. So the issue seems to be with the RN piece within the STUFF.
Is there anything that can be done to make this run quickly without using temp tables? The same thing works fine with temp tables instead of CTE but the requirement is to have this code in a view.
There are only 350 total entries in the table and the result. Running on MS SQL Server 2016 (13.0.7016.1)
Note: Data_1 field requirement is to show 6 most recent updates per Program in the JSON string but in order from oldest to newest. That's the only reason why I am using ROW_NUMBER because the underlying data can have alot more than 6 updates per Program.
WITH
CTE AS
(SELECT P.Program_Number,
P.Date_Status,
'{"date":"' P.Date_Status_Display '","percent":"' P.Percent_Complete '","status":":' P.Status_Overall_Col '"}' AS JSON_String,
ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
FROM dbo.Main_Entries_Table)
SELECT P.[Program_Number],
P.[Program_Name],
'[' STUFF((SELECT ',' [JSON_String]
FROM CTE C
WHERE C.Program_Number = P.Program_Number
AND RN <= 6
ORDER BY RN DESC
FOR XML PATH('')),1,1,'') ']' AS Data_1,
'[' STUFF((SELECT ',' [JSON_String]
FROM CTE C
WHERE C.Program_Number = P.Program_Number
ORDER BY Date_Status ASC
FOR XML PATH('')),1,1,'') ']' AS Data_2,
P.Last_Updated
FROM dbo.Main_Entries_Table P;
CodePudding user response:
First, STRING_AGG
will not help you here. The optimizer leverages the same tricks to concatenate the string either way. STRING_AGG, however, is cleaner and handles conversions better but it would not solve this problem.
Next, for a good answer you should include DDL and sample data like so. This is what I'll use to show you what's up:
IF OBJECT_ID('tempdb..#Main_Entries_Table') IS NOT NULL DROP TABLE #Main_Entries_Table;
CREATE TABLE #Main_Entries_Table
(
Program_Number INT,
[Program_Name] VARCHAR(20),
Date_Status INT,
Status_Overall_Col INT,
Percent_Complete DECIMAL(4,2),
Date_Status_Display VARCHAR(10)
);
INSERT #Main_Entries_Table
VALUES(1,'ABC',1,10,.1,'Yay!'),(1,'ABC',1,40,.95,'blah'),(1,'XYZ',0,10,.03,'NA'),
(1,'ABC',3,44,.2,'Booo'),(1,'ABC',1,33,.35,'blah'),(1,'XYZ',0,999,.73,'NA'),
(2,'RRR',1,10,.1,'Booo'),(2,'RRR',1,90,.44,'blah'),(2,'RRR',0,10,.03,'NA'),
(2,'RRR',3,44,.2,'Booo'),(2,'RRR',1,93,.44,'blah'),(2,'RRR',0,55,.73,'NA');
Now lets look at your CTE query and the execution plan:
CTE Query Section
SELECT P.Program_Number,
P.Date_Status,
'{"date":"' LEFT(P.Date_Status_Display,4) '","percent":"' LEFT(P.Percent_Complete,6)
'","status":":' LEFT(P.Status_Overall_Col,4) '"}'
AS JSON_String,
ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
FROM #Main_Entries_Table AS p
Execution plan:
Depending on your data, that can be a big ol' expensive sort. This index will fix that:
CREATE CLUSTERED INDEX idx_123 ON #Main_Entries_Table(Program_Number ASC, Date_Status DESC);
This ^^^ is what Itzik Ben-Gan calls a
Now for your Data_2 column (excluding Data_1):
WITH
CTE AS
(
SELECT P.Program_Number,
P.Date_Status,
'{"date":"' LEFT(P.Date_Status_Display,4) '","percent":"' LEFT(P.Percent_Complete,6)
'","status":":' LEFT(P.Status_Overall_Col,4) '"}'
AS JSON_String,
ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
FROM #Main_Entries_Table AS p
)
SELECT
P.[Program_Number],
P.[Program_Name],
'[' STUFF((SELECT ',' [JSON_String]
FROM CTE C
WHERE C.Program_Number = P.Program_Number
ORDER BY Date_Status ASC
FOR XML PATH('')),1,1,'') ']' AS Data_2
FROM #Main_Entries_Table AS P;
Execution plan:
Both queries (inside and outside the CTE) leverage the index to eliminate the sort AND to perform a seek against your rows (vs a scan which is slower). Now for your Data_1 column.
WITH
CTE AS
(
SELECT P.Program_Number,
P.Date_Status,
'{"date":"' LEFT(P.Date_Status_Display,4) '","percent":"' LEFT(P.Percent_Complete,6)
'","status":":' LEFT(P.Status_Overall_Col,4) '"}'
AS JSON_String,
ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
FROM #Main_Entries_Table AS p
)
SELECT
P.[Program_Number],
P.[Program_Name],
'[' STUFF((SELECT ',' [JSON_String]
FROM CTE C
WHERE C.Program_Number = P.Program_Number
AND RN <= 2
--ORDER BY RN DESC
FOR XML PATH('')),1,1,'') ']' AS Data_1
FROM #Main_Entries_Table AS p;
Here you will get will get a sort and scan if you include the ORDER BY clause. That said, you don't need it. With the aforementioned index in place, this will be quite fast:
WITH
CTE AS
(
SELECT P.Program_Number,
P.Date_Status,
'{"date":"' LEFT(P.Date_Status_Display,4) '","percent":"' LEFT(P.Percent_Complete,6)
'","status":":' LEFT(P.Status_Overall_Col,4) '"}' AS JSON_String,
ROW_NUMBER() OVER (PARTITION BY P.Program_Number ORDER BY P.Date_Status DESC) AS RN
FROM #Main_Entries_Table AS p
)
SELECT
P.[Program_Number],
P.[Program_Name],
'[' STUFF((SELECT ',' [JSON_String]
FROM CTE C
WHERE C.Program_Number = P.Program_Number
AND RN <= 2
-- ORDER BY RN DESC
FOR XML PATH('')),1,1,'') ']' AS Data_1,
'[' STUFF((SELECT ',' [JSON_String]
FROM CTE C
WHERE C.Program_Number = P.Program_Number
ORDER BY Date_Status ASC
FOR XML PATH('')),1,1,'') ']' AS Data_2
FROM #Main_Entries_Table AS P;
Check out the final plan:
The key here is understanding how to analyze the execution plan data to tune your SQL.
CodePudding user response:
The issue with the query appears to be the use of the ROW_NUMBER() function with the STUFF() function. The ROW_NUMBER() function is adding an additional sorting and filtering step to the query, which is causing it to run much slower. One way to improve the performance of the query would be to use a subquery or a join instead of the ROW_NUMBER() function to filter and sort the data. Another option would be to use a temporary table or a table variable to store the results of the ROW_NUMBER() function, and then join to that table in the outer query.
Also, you can try to use a more optimized indexing strategy on the table, which would help the query to run faster.