I have a database table with about 3.5 million rows. The table holds contract data records, with an amount, a date, and some IDs related to other tables (VendorId, AgencyId, StateId), this is the database table:
CREATE TABLE [dbo].[VendorContracts]
(
[Id] [uniqueidentifier] NOT NULL,
[ContractDate] [datetime2](7) NOT NULL,
[ContractAmount] [decimal](19, 4) NULL,
[VendorId] [uniqueidentifier] NOT NULL,
[AgencyId] [uniqueidentifier] NOT NULL,
[StateId] [uniqueidentifier] NOT NULL,
[CreatedBy] [nvarchar](max) NULL,
[CreatedDate] [datetime2](7) NOT NULL,
[LastModifiedBy] [nvarchar](max) NULL,
[LastModifiedDate] [datetime2](7) NULL,
[IsActive] [bit] NOT NULL,
CONSTRAINT [PK_VendorContracts]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
I have a page on my site where the user can filter a paged grid by VendorId
and ContractDate
, and sort by the ContractAmount
or ContractDate
. This is the query that EF Core produces when sorting by ContractAmount
for this particular vendor that has over a million records:
DECLARE @__vendorId_0 uniqueIdentifier = 'f39c7198-b05a-477e-b7bc-cb189c5944c0';
DECLARE @__startDate_1 datetime2 = '2017-01-01T07:00:00.0000000';
DECLARE @__endDate_2 datetime2 = '2018-01-02T06:59:59.0000000';
DECLARE @__p_3 int = 0;
DECLARE @__p_4 int = 50;
SELECT [v].[Id], [v].[AdminFee], [v].[ContractAmount], [v].[ContractDate], [v].[PONumber], [v].[PostalCode], [v].[AgencyId], [v].[StateId], [v].[VendorId]
FROM [VendorContracts] AS [v]
WHERE (([v].[VendorId] = @__vendorId_0) AND ([v].[ContractDate] >= @__startDate_1)) AND ([v].[ContractDate] <= @__endDate_2)
ORDER BY [v].[ContractAmount] ASC
OFFSET @__p_3 ROWS FETCH NEXT @__p_4 ROWS ONLY
When I run this, it takes 50s, whether sorting ASC
or DESC
or offsetting by thousands, it's always 50s.
If I look at my Execution Plan, I see that it does use my index, but the Sort Cost is what's making the query take so long
This is my index:
CREATE NONCLUSTERED INDEX [IX_VendorContracts_VendorIdAndContractDate] ON [dbo].[VendorContracts]
(
[VendorId] ASC,
[ContractDate] DESC
)
INCLUDE([ContractAmount],[AdminFee],[PONumber],[PostalCode],[AgencyId],[StateId])
WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF)
The strange thing is that I have a similar index for sorting by ContractDate
, and that one returns results in less than a second, even on the vendor that has millions of records.
Is there something wrong with my index? Or is sorting by a decimal
data type just incredibly intensive?
CodePudding user response:
You have an index that allows the
VendorId = @__vendorId_0 and ContractDate BETWEEN @__startDate_1 AND @__endDate_2
predicate to be seeked exactly.
SQL Server estimates that 6,657 rows will match this predicate and need to be sorted so it requests a memory grant suitable for that amount of rows.
In reality for the parameter values where you see the problem nearly half a million are sorted and the memory grant is insufficient and the sort spills to disc.
50 seconds for 10,299 spilled pages does still sound unexpectedly slow but I assume you may well be on some very low SKU in Azure SQL Database?
Some possible solutions to resolve the issue might be to
- Force it to use an execution plan that is compiled for parameter values with your largest vendor and wide date range (e.g. with
OPTIMIZE FOR
hint). This will mean an excessive memory grant for smaller vendors though which may mean other queries have to incur memory grant waits. - Use
OPTION (RECOMPILE)
so every invocation is recompiled for the specific parameter values passed. This means in theory every execution will get an appropriate memory grant at the cost of more time spent in compilation. - Remove the need for a sort at all. If you have an index on
VendorId, ContractAmount INCLUDE (ContractDate)
then theVendorId = @__vendorId_0
part can be seeked and the index read inContractAmount
order. Once 50 rows have been found that match theContractDate BETWEEN @__startDate_1 AND @__endDate_2
predicate then query execution can stop. SQL Server might not choose this execution plan without hints though.
I'm not sure how easy or otherwise it is to apply query hints through EF but you could look at forcing a plan via query store if you manage to get the desired plan to appear there.