Home > Software design >  How to select only the first rows for each combination of a set of columns?
How to select only the first rows for each combination of a set of columns?

Time:09-14

I found this article helpful: How to select only the first rows for each unique value of a column?

However, it did not fully cover my requirements. To be honest, I don't think I fully understood the purpose / how to use the partition statement.

Starting with this MS SQL Table, I'd like to select the latest date for each File/Device combination.

 ----------------------- ------------------------ ------------------------ 
|         Date          |          File          |         Device         |
 ----------------------- ------------------------ ------------------------ 
|  09/09/2022           |  Acrobat.exe           |         ABC123         |
|  09/10/2022           |  Acrobat.exe           |         DEF456         |
|  11/07/2021           |  Visio.exe             |         ABC123         |
|  12/31/2020           |  Acrobat.exe           |         ABC123         |
 ----------------------- ------------------------ ------------------------ 

I wrapped my head around this for a while now and the only thing I could come up with was

with cte as (
    SELECT Date, File, Device, 
    ROW_NUMBER() over (partition by Device order by File, Date DESC) as [r]
    FROM MyTable
)
select Date, File, Device, from cte WHERE [r] = 1

However, this will not always return exactly what I need. I'm not quite sure why but I think because it's only taking the change of one column into consideration for the row number to increase it's value.

CodePudding user response:

I'd like to select the latest date for each File/Device combination.
So you mean something like this ?

select max(t.mydate) as [date],
       t.myfile,
       t.device
from   mytable t
group by t.myfile, t.device
order by t.myfile

See this DBFiddle

the result is

date myfile device
2022-09-09 Acrobat.exe ABC123
2022-09-10 Acrobat.exe DEF456
2021-11-07 Visio.exe ABC123

CodePudding user response:

I think because it's only taking the change of one column into consideration

Right, because that's what you asked for in the over() clause. Specifically, partition by Device. Given your requirement for "select the latest date for each File/Device combination", your definition for the r column in your CTE should look like:

ROW_NUMBER() over (partition by File, Device order by Date DESC) as [r]
  • Related