Queries on large table extremely slow, how can I optimize?-CodePudding

I have a database table with about 3.25 million rows. I'm experiencing extremely slow query times for simple queries, I don't think I'm trying to do anything crazy. I have basically zero knowledge of database optimization, so hopefully this is a simple fix.

The table holds contract data records, with an amount, a date, and some IDs related to other tables (VendorId, AgencyId, StateId), this is the database table:

CREATE TABLE [dbo].[VendorContracts]
(
    [Id] [uniqueidentifier] NOT NULL,   
    [ContractDate] [datetime2](7) NOT NULL,
    [ContractAmount] [decimal](19, 4) NULL, 
    [VendorId] [uniqueidentifier] NOT NULL,
    [AgencyId] [uniqueidentifier] NOT NULL,
    [StateId] [uniqueidentifier] NOT NULL,

    [CreatedBy] [nvarchar](max) NULL,
    [CreatedDate] [datetime2](7) NOT NULL,
    [LastModifiedBy] [nvarchar](max) NULL,
    [LastModifiedDate] [datetime2](7) NULL,
    [IsActive] [bit] NOT NULL,

    CONSTRAINT [PK_VendorContracts] 
        PRIMARY KEY CLUSTERED ([Id] ASC)
                WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
                      OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

I have these 3 indexes on the database. I didn't create these, looks like Entity Framework Core did automatically with a migration, so they could be wrong or something.

CREATE NONCLUSTERED INDEX [IX_VendorContracts_AgencyId] 
ON [dbo].[VendorContracts] ([AgencyId] ASC)
         WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, 
               ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_VendorContracts_StateId] 
ON [dbo].[VendorContracts] ([StateId] ASC)
         WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, 
               ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_VendorContracts_VendorId] 
ON [dbo].[VendorContracts] ([VendorId] ASC)
         WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, 
               ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

I have a page on my site where I'd like users to be able to select a vendor, and see the total amount of contract amounts, split by state.

I have this query:

SELECT SUM(ContractAmount), StateId
FROM [dbo].[VendorContracts]
WHERE VendorId = '...'
GROUP BY StateId

But it takes forever, like 15 - 20 minutes.

I tried other simpler queries, I figured maybe the SUM was slowing everything down, but those are incredibly slow too. I tried querying by vendor and state:

SELECT *
FROM [dbo].[VendorContracts]
WHERE VendorId = '...' AND StateId = '...'

And even though it only returns a couple thousands rows, it still takes way too long.

The strange this is that I have another database on a separate server with a similar table, that has about 6 million records, and it doesn't have issues like this at all. I can run the SUM query and other queries in less than a second or two. And that database table has the same indexes, so not sure why that one is so efficient, and this one isn't.

I've been inserting hundreds of thousands of records each day the past week, and I reorganized the indexes, but that didn't fix the issue, is there something else I need to do to optimize this? My database is on Azure, do I maybe need to change increase some settings or something?

CodePudding user response：

The problem is that your indexes do not cover your query. In other words: the server cannot service your query by using just one index, so either it will have to do a key lookup for every row, or more likely it will choose to just scan the whole table.

Generally, single-column indexes are not very useful for precisely this reason. You can change one of your existing indexes.

You want the equality = predicates from your WHERE to be the first columns in the index key.
Then you add in join columns and grouping columns. It is normally only worth it to add one of these at this stage, unless a join is on a unique value.
Finally, add in all other columns. These do not have to be part of the key, they can be INCLUDE columns.

For example:

CREATE NONCLUSTERED INDEX [IX_VendorContracts_VendorId] ON [dbo].[VendorContracts]
  (VendorId, StateId)
INCLUDE
  (ContractAmount)
WITH (DROP_EXISTING = ON, ONLINE = ON);