Home > Enterprise >  Improve the speed of this string_agg?
Improve the speed of this string_agg?

Time:03-02

I have data of the following shape:

BOM -- 500 rows, 4 cols
PartProject -- 2.6mm rows, 4 cols
Project -- 1000 rows, 5 cols
Part -- 200k rows, 18 cols

Yet when I try to do string_agg, my code will take me well over 10 minutes to execute on 500 rows. How can I improve this query (the data is not available).

select
    BOM.*,
    childParentPartProjectName
into #tt2 -- tt for some testing
from #tt1 AS BOM -- tt for some testing
-- cross applys for string agg many to one
CROSS APPLY (
    SELECT childParentPartProjectName = STRING_AGG(PROJECT_childParentPart.NAME, ', ') WITHIN GROUP (ORDER BY PROJECT_childParentPart.NAME)
    FROM (
        SELECT DISTINCT PROJECT3.NAME
    FROM [dbo].[Project] PROJECT3
    LEFT JOIN [dbo].[Part] P3 on P3.ITEM_NUMBER = BOM.childParentPart
    LEFT JOIN [dbo].[PartProject] PP3 on PP3.SOURCE_ID = P3.ID
    WHERE PP3.RELATED_ID = PROJECT3.ID and P3.CURRENT = 1
) PROJECT_childParentPart ) PROJECT3

CodePudding user response:

The subquery (within a subquery) you have has a code "smell" to it that it's been written with intention, but not correctly.

Firstly you have 2 LEFT JOINs in the subquery, however, both the tables aliased as P3 and PP3 are required to have a non-NULL value; that is impossible if no related row is found. This means the JOINs are implicit INNER JOINs.

Next you have a DISTINCT against a single column when SELECTing from multiple tables; this seems wrong. DISTINCT is very expensive and the fact you are using it implies that either NAME is not unique or that due to your implicit INNER JOINs you are getting duplicate rows. I assume it's the latter. As a results, very likely you should actually be using an EXISTS, not LEFT JOINs INNER JOINs.

The following is very much a guess, but I suspect it will be more performant.

SELECT BOM.*, --Replace this with an explicit list of the columns you need
       SA.childParentPartProjectName
INTO #tt2
FROM #tt1 BOM
     CROSS APPLY (SELECT STRING_AGG(Prj.NAME, ', ') WITHIN GROUP (ORDER BY Prj.NAME) AS childParentPartProjectName
                  FROM dbo.Project Prj --Don't use an alias that is longer than the object name
                  WHERE EXISTS (SELECT 1
                                FROM dbo.Part P
                                     JOIN dbo.PartProject PP ON P.ID = PP.SOURCE_ID
                                WHERE PP.Related_ID = Prg.ID
                                  AND P.ITEM_NUMBER = BOM.childParentPart
                                  AND P.Current = 1)) SA;
  • Related