Getting Index Scan instead Index Seeking if IN clause larger in Azure Sql-CodePudding

This is the Non clustered index we have :

ALTER TABLE [Allocation].[allocation_plan_detail] ADD  CONSTRAINT [UQ_AP_TENANT_TYPE_STATUS_PLAN_ITEM_CLUB] UNIQUE NONCLUSTERED 
(
    [tenant_id] ASC,
    [item_type] ASC,
    [allocation_plan_status] ASC,
    [allocation_plan_type] ASC,
    [item_nbr] ASC,
    [club_nbr] ASC
)

Now when I am running this query :

SELECT *
FROM   Allocation.allocation_plan_detail
WHERE  tenant_id = 'sams_us'
       AND item_type = 'inseason'
       AND allocation_plan_status = 'draft'
       AND allocation_plan_type = 'continuous'
       AND item_nbr IN ( 10177, 107, 109, 112,
                         511993, 117, 120, 122, 31889 )

In the execution plan I am getting the perfect Index Seek. But when the item_nbr is large lets say aroun 200 items its starting with constant scan and the query is extremely slow.

select * from Allocation.allocation_plan_detail where item_nbr in (72512,207317,...N(200 )) and allocation_plan_status='draft' and item_type ='inseason' and tenant_id='sams_us' and allocation_plan_type='continuous'` -> This is the issue.

Can anyone help me on this? How to resolve the issue? And is constant scan and table scan similar?

CodePudding user response：

The constant scan is not itself indicative of a problem.

This is normal if there are >= 64 predicates in an IN list as this is the maximum number of seek operations that can be applied to a seek operator directly in an execution plan.

It is still perfectly possible for rows to come out of the constant scan, go into a nested loops and be used to perform an index seek using your UQ_AP_TENANT_TYPE_STATUS_PLAN_ITEM_CLUB index.

This is not the plan you get however. You

Though when you add the WITH (FORCESEEK) hint you do get the earlier described plan

The reason why the execution plan is not chosen anyway is because your index does not cover all the many columns returned by this query so additional key lookups will be needed. SQL Server costs the plan with the scan of the covering index and hash join as 738.443 units vs 1195.91 units when you force the seek.

In this case SQL Server estimates that 462,724 rows will be returned and only 58,470 actually are so it over estimates the number of lookups that will be needed and corresponding costs of that option however.

If you do use the WITH (FORCESEEK) hint in production it does have a risk that for a different IN list you may end up encountering an eye watering number of lookups and it ends up making things significantly worse than the hash join plus covering index scan plan.

You may even be better breaking the query up into two stages. One that gets the clustered index keys of the rows to be returned into a temp table or table variable and then a semi join against that. This would allow other plans than just N lookups to be considered.

DECLARE @allocation_plan_detail_ids TABLE(allocation_plan_detail_id INT PRIMARY KEY);

INSERT @allocation_plan_detail_ids(allocation_plan_detail_id)
SELECT 
        allocation_plan_detail_id /*Implicitly Included in the index as CI key*/
from Allocation.allocation_plan_detail 
WHERE allocation_plan_status='draft' and item_type ='inseason' and tenant_id='sams_us' and allocation_plan_type='continuous' and item_nbr in (...);


select * 
from Allocation.allocation_plan_detail 
WHERE allocation_plan_detail_id IN 
(SELECT a.allocation_plan_detail_id FROM @allocation_plan_detail_ids a)
OPTION (RECOMPILE) /*So cardinality of result taken into account*/

In your case when you did hint the FORCESEEK the "real" execution time was actually significantly shorter (434 ms vs 18.123 seconds) - but unfortunately for some reason it spent a lot longer on ASYNC_NETWORK_IO waits (sending the results to the client and waiting for the client to process them).

This wait time by far dominated the overall elapsed time. As both queries are returning exactly the same results this seems like a transient issue assuming you are using exactly the same client in both cases.

One way to take the ASYNC_NETWORK_IO out of equation during development would be to simply SELECT ... INTO a temp table rather than do the final SELECT so nothing is sent to the client at all. Of course you need to remember to remove this when finished comparing the options so you do actually return the required results.