Why are my execution plan estimates so inaccurate?-CodePudding

I have written relatively complicated query that returns 7k rows in 1-2s.

But when I add where condition as follows, it takes 10-20s

WHERE exists (select 1 from BOM where CustomerId = @__customerId_0 and Code = queryAbove.Code)

I've checked the execution plan and the problem is not BOM from where clause itself (it is indexed and returns ccs 100 rows).

According to actual execution plan, 90% of the cost consumed by subquery that is fast without this where condition.

The question is, why are the estimates from execution plan so inaccurate? Is this "normal", or something is off with my database

I have tried sp_updatestats and rebuild indexes - it didn't have any effect.

As per comment, here is the query and execution plan: https://www.brentozar.com/pastetheplan/?id=HkKgCWGuq and my indexes:


ALTER TABLE [dbo].[ČíselníkMateriál] ADD  CONSTRAINT [PK_SkladČíselníkMateriál] PRIMARY KEY CLUSTERED 
(   [KódMateriálu] ASC  )

ALTER TABLE [dbo].[SkladVýdajMateriál] ADD  CONSTRAINT [PK_SkladVýdajMateriál] PRIMARY KEY CLUSTERED 
(   [ID] ASC    )

ALTER TABLE [dbo].[Výrobky] ADD  CONSTRAINT [Výrobky_PK] PRIMARY KEY CLUSTERED 
(   [ID] ASC    )

ALTER TABLE [dbo].[VýrobkyMateriál] ADD  CONSTRAINT [PK_VýrobkyMateriál] PRIMARY KEY CLUSTERED 
(   [ID] ASC    )

ALTER TABLE [dbo].[Zákazky] ADD  CONSTRAINT [Zákazky_PK] PRIMARY KEY CLUSTERED 
(   [ID] ASC    )




CREATE UNIQUE NONCLUSTERED INDEX [UQ_ČíselníkMateriál_ID] ON [dbo].[ČíselníkMateriál]
(   [ID] ASC ) WITH (IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_SkladVýdajMateriál_IDVýrobky] ON [dbo].[SkladVýdajMateriál]
(   [IDVýrobky] ASC )

CREATE NONCLUSTERED INDEX [IX_SkladVýdajMateriál_KódMateriálu] ON [dbo].[SkladVýdajMateriál]
(   [KódMateriálu] ASC  )


CREATE NONCLUSTERED INDEX [IX_Výrobky_Status] ON [dbo].[Výrobky]
(   [Status] ASC    )

CREATE NONCLUSTERED INDEX [NonClusteredIndex-20220527-162213] ON [dbo].[Výrobky]
(   [ID] ASC    ) INCLUDE([IDZákazky],[Množstvo]) WITH (DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [VýrobkyIDZákazky] ON [dbo].[Výrobky]
(   [IDZákazky] ASC ) INCLUDE([Množstvo]) 

CREATE NONCLUSTERED INDEX [IX_VýrobkyMateriál_IdMateriálu] ON [dbo].[VýrobkyMateriál]
(   [IdMateriálu] ASC   ) INCLUDE([IdVýrobku],[Množstvo],[ID]) 

CREATE NONCLUSTERED INDEX [IX_VýrobkyMateriál_IdVýrobku] ON [dbo].[VýrobkyMateriál]
(   [IdVýrobku] ASC ) INCLUDE([IdMateriálu],[Množstvo],[ID]) 

CREATE NONCLUSTERED INDEX [IDX_Zákazky_Status] ON [dbo].[Zákazky]
(   [Status] ASC    ) INCLUDE([ID])

CodePudding user response：

why are the estimates so inaccurate?

In my experience, JOIN too many tables that might let estimates more and more inaccurate in one query, because of that estimates rows histogram statistics and formula SQL Server Join Estimation using Histogram Coarse Alignment

I would try to rewrite your query first, but I didn't know the logic of your tables.

There are two Key Lookup that might cost more logical read from non-cluster index Lookup to cluster index, so I would suggest modifying there are two indexes.

using IdVýrobku column be the key instead of INCLUDE column inIX_VýrobkyMateriál_IdMateriálu index, because IdVýrobku is JOIN key.

CREATE NONCLUSTERED INDEX [IX_VýrobkyMateriál_IdMateriálu] ON [dbo].[VýrobkyMateriál]
([IdMateriálu] ASC,[IdVýrobku]) 
INCLUDE([Množstvo],[ID]);

Add include Množstvo column on IX_SkladVýdajMateriál_IDVýrobky which add data in leaf page might reduce Key Lookup to cluster index from IX_SkladVýdajMateriál_IDVýrobky non-clustered index.

CREATE NONCLUSTERED INDEX [IX_SkladVýdajMateriál_IDVýrobky] ON [dbo].[SkladVýdajMateriál]
( [IDVýrobky] ASC ) 
INCLUDE (Množstvo);

CodePudding user response：

D-Shih is right. When you JOIN many tables, wrong estimates get multiplied into really wrong estimates. But the first wrong estimates (506 customer/materials and not 65) is due to the distribution of data in isys.MaterialCustomers.

I'm not sure if 65 is the average number of materials per customer or just the average in that particular bucket, but you end up with 8x the number of lookups SQL Server estimated, and from there it just compounds.

SQL Server thinks it only needs to do 65 iterations on the initial query, but in fact has to do 500. Had it known in advance, maybe it would have chosen another plan. If you look for customerIds with fewer materials, you should get a faster execution, but also inaccurate estimates because now you're looping over fewer than 65 materials.

One way to fix you query is to

SELECT Code
INTO #CodesToConsider
FROM isys.MaterialCustomers
WHERE CustomerId = @__customer_id;

(... your huge select ...)
AS [s]
INNER JOIN #CodesToConsider ctc ON (s.MaterialCode = ctc.Code);

The temp-table will provide SQL Server with statistics on how many Codes to consider and then recompile the second part if those statistics change. They need to change by a bit, so I'm a bit unsure if 65 -> 506 is enough, but that's for you to find out.