Home > Software design >  SQL Distinct Query with Order By using Group By Performance
SQL Distinct Query with Order By using Group By Performance

Time:06-10

I have MSSQL database table called DomainModel that has a primary key column'Id' (GUID) and 'modelGuid' (GUID) column and a 'GeneratedDate' (DateTime) column.

There are lots of other columns within the table, but for this question I will keep that to just these 3 columns.

Each row in the table represents one or more models where modelGuid is the Id of the Model.

As such we can have multiple models with the same modelGuid but unique generateddate values.

I am trying to build a query that returns the lastest rows (by GeneratedDate) models distinct to 'modelGuid' but without any luck.

This query shows all the data

Select Id, ModelGuid, GeneratedDate from DomainModels
Order by GeneratedDate

Id                                      ModelGuid                          GeneratedDate
6FAE1B4C-66D2-4C73-9B09-D59B8FE24845    9D33A4F2-B94F-4125-9DC8-36B8DA821401    2022-06-06 05:39:40.9207283
02B02C91-AFE9-40E4-BFB2-EF7682516C49    9D33A4F2-B94F-4125-9DC8-36B8DA821401    2022-06-06 19:30:07.5008305
62903F73-9B97-4504-9B87-5CE17D7BB0AE    9D33A4F2-B94F-4125-9DC8-36B8DA821401    2022-06-06 22:00:52.8610504
8767A8A4-B6F9-4AF0-B8D5-12351AF55AD4    94D20197-1679-4BF6-BB43-F2FFA021641D    2022-06-07 18:27:05.5213174
777E13F0-55B4-4BAF-B1F7-3C4CFA75FBB6    9D33A4F2-B94F-4125-9DC8-36B8DA821401    2022-06-07 18:27:05.5213175
931CBDEB-7D5A-42A0-BED7-1BAB67AFDE79    9D33A4F2-B94F-4125-9DC8-36B8DA821401    2022-06-07 18:27:05.5213176

This query works in that it returns the correct modelGuid and GeneratedDate however GenerateDate is a non-indexed field and I am worried that it will be a performance issue with larger datasets ..

SELECT dm1.ModelGuid, dm1.GeneratedDate, dm1.Id
FROM DomainModels dm1
JOIN (SELECT  ModelGuid as ModelGuid, MAX(GeneratedDate) as GeneratedDate FROM DomainModels GROUP BY ModelGuid) dm2
ON   dm1.GeneratedDate = dm2.GeneratedDate

 



ModelGuid                               GeneratedDate               Id
94D20197-1679-4BF6-BB43-F2FFA021641D    2022-06-07 18:27:05.5213174 8767A8A4-B6F9-4AF0-B8D5-12351AF55AD4
9D33A4F2-B94F-4125-9DC8-36B8DA821401    2022-06-07 18:27:05.5213176 931CBDEB-7D5A-42A0-BED7-1BAB67AFDE79

Is there a better, more efficient query I can run?

CodePudding user response:

A simple top-1-per-group query should suffice for this.

Row-numbering works well, given the right indexes. Depending on whether you want ties, you may want DENSE_RANK instead.

SELECT
  dm.Id,
  dm.ModelGuid,
  dm.GeneratedDate
FROM (
    SELECT *,
      rn = ROW_NUMBER() OVER (PARTITION BY dm.ModelGuid ORDER BY dm.GeneratedDate DESC)
    FROM DomainModels dm
) dm
WHERE dm.rn = 1;

For this to work efficiently, you need the following index

DomainModels (dmModelGuid ASC, GeneratedDate DESC) INCLUDE (Id)

db<>fiddle

  • Related