Home > Enterprise >  SUM query on millions of records is slow even with indexes, how can I optimize?
SUM query on millions of records is slow even with indexes, how can I optimize?

Time:03-22

I have a database table with about 3.5 million records. The table holds contract data records, with an amount, a date, and some IDs related to other tables (VendorId, AgencyId, StateId), this is the database table:

CREATE TABLE [dbo].[VendorContracts]
(
    [Id] [uniqueidentifier] NOT NULL,   
    [ContractDate] [datetime2](7) NOT NULL,
    [ContractAmount] [decimal](19, 4) NULL, 
    [VendorId] [uniqueidentifier] NOT NULL,
    [AgencyId] [uniqueidentifier] NOT NULL,
    [StateId] [uniqueidentifier] NOT NULL,

    [CreatedBy] [nvarchar](max) NULL,
    [CreatedDate] [datetime2](7) NOT NULL,
    [LastModifiedBy] [nvarchar](max) NULL,
    [LastModifiedDate] [datetime2](7) NULL,
    [IsActive] [bit] NOT NULL,

    CONSTRAINT [PK_VendorContracts] 
        PRIMARY KEY CLUSTERED ([Id] ASC)
                WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
                      OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

I have a page on my site where I'd like users to be able to select a vendor, and see the total amount of contract amounts, split by state.

I have this query:

SELECT SUM(ContractAmount), StateId
FROM [dbo].[VendorContracts]
WHERE VendorId = '...'
GROUP BY StateId

And I created this index to help with that:

CREATE NONCLUSTERED INDEX [IX_VendorContracts_VendorId] ON [dbo].[VendorContracts]
(
    [VendorId] ASC
)
INCLUDE([Id],[StateId],[ContractAmount],[ContractDate],[AgencyId]) 
WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF)

This works great for most my vendors, I can get results back fairly quickly in a second or two. But there are a couple vendors that each have about 1 million records though, and those take quite a while, usually a few minutes.

So I was wondering if there's any way to optimize further, by modifying my index? Or any other way?

Or am I maybe going about this wrong? Not sure if running such huge SUM queries on 1 million records is just not a great way to do this. Records aren't added very frequently to this table, we maybe add a few thousand once or twice a month, so I was thinking of maybe having a nightly process that would grab the SUM data and store them in a separate table, and then query that when I need to run reports.

CodePudding user response:

You should take the query...

SELECT SUM(ContractAmount) AS TotalContractAmount,
       VendorId,
       StateId
FROM   [dbo].[VendorContracts]
GROUP  BY VendorId,
          StateId

... and make it into an indexed view so the results are pre-calculated and maintained automatically by the system (with unique clustered index on VendorId, StateId).

Indexed views can have overhead for data modifications but you say

Records aren't added very frequently to this table, we maybe add a few thousand once or twice a month,

SQL Server should do the indexed view matching automatically on Azure SQL database and recognize that the original query can be served by an indexed view.

If you find this matching does not happen (verify the execution plan) then you may need to resort to selecting from the indexed view explicitly and using a NOEXPAND hint.

CodePudding user response:

If you have an operational database system where you want to do some reporting, then you should consider using a columnstore index in Azure SQL DB. It supports a clustered and non-clustered columnstore. Columnstores are great for reporting/analytics both because they are highly compressed compared to B-Trees/Heaps and because there is an optimized execution engine called batch mode that is substantially more efficient for queries over lots of rows like your report query.

You can read about the syntax at doc page.

Without knowing more about your workload, I can't say for sure whether you should use clustered or non-clustered. However, it is relatively easy to try non-clustered to see if it helps your app. While indexed views can be used to make a specific query pattern go faster, it does have some limitations - for example, if you have a lot of updates happening on your main table, those update queries now have to take locks on both the main table indexes as well of those on the indexed view. This can potentially cause blocking on locks. Try batch mode via a non-clustered columnstore index (you will want to move to compat mode 150 for the latest/greatest performance enhancements if you have not) to see if this solves your problem.

  • Related