Composite Index Design (Narrow vs Wide) when searching on date range-CodePudding

I have an argument with my colleague regarding the design of the composite index. If two fields are being used in the query, I reckon the more unique field should be first.

Look at this example

CREATE TABLE [dbo].[tblPurchase](
    [PurchaseId] [int] NOT NULL,
    [ProductId] [int] NOT NULL,
    [ItemId] [int] NOT NULL,
    [PurchaseDate] [date] NOT NULL,
    [CurrencyId] [int] NOT NULL,
    ...
    ...
    ...
CONSTRAINT [PK_tblPurchase] PRIMARY KEY CLUSTERED 
(
    [PurchaseId] ASC,
    [ProductId] ASC,
    [ItemId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

At the moment we have 2 currencies and we have about 2 million records (note: we may add 2 new currencies in near future).

I decided to create an index using [PurchaseDate]-[CurrencyId] as the composite index, but my colleague argue that we should flip it to use [CurrencyId]-[PurchaseDate] as CurrencyId is less unique. Our query is:

SELECT *
FROM [tblPurchase]
WHERE [PurchaseDate] >= '2022-01-01' AND [PurchaseDate] <= '2022-06-01' AND [CurrencyId] = 1

Any thought?

CodePudding user response：

The optimal index for

WHERE [PurchaseDate] >= '2022-01-01' AND [PurchaseDate] <= '2022-06-01' AND [CurrencyId] = 1

is [CurrencyId]-[PurchaseDate]

this has nothing to do with selectivity.

It can use the first column to do an equality seek on CurrencyId and then a range seek on the second column so read exactly the right rows to resolve the query.

With the reverse index order the best that it can do is a range seek on the date predicate and then a residual predicate to discard the rows read for other currencies. So the index seek will read more rows in this case (assuming that other currency ids exist in the date range than just 1)