Home > Software engineering >  How to select the best item in each group?
How to select the best item in each group?

Time:04-02

I have table reports:

 ---- ----------- 
| id | file_name |
 ---- ----------- 
| 1  | jan.xml   |
 ---- ----------- 
| 2  | jan.csv   |
 ---- ----------- 
| 3  | feb.csv   |
 ---- ----------- 

In human language: there are reports for each month. Each report could be in XML or CSV format. There could be 1-2 reports for each month in unique format.

I want to select the reports for all months, picking only 1 file for each month. The XML format is more preferable.

So, expected output is:

 ---- ----------- 
| id | file_name |
 ---- ----------- 
| 1  | jan.xml   |
 ---- ----------- 
| 3  | feb.csv   |
 ---- ----------- 

Explanation: the file jan.csv was excluded since there is more preferable report for that month: jan.xml.

CodePudding user response:

As mentioned in the comments your data structure has a number of challenges. It really needs a column for ReportDate or something along those lines that is a date/datetime so you know which month the report belongs to. That would also give you something to sort by when you get your data back. Aside from those much needed improvements you can get the desired results from your sample data with something like this.

create table SomeFileTable
(
    id int
    , file_name varchar(10)
)

insert SomeFileTable
select 1, 'jan.xml' union all
select 2, 'jan.csv' union all
select 3, 'feb.csv'

select s.id
    , s.file_name
from
(
    select *
        , FileName = parsename(file_name, 2)
        , FileExtension = parsename(file_name, 1)
        , RowNum = ROW_NUMBER() over(partition by parsename(file_name, 2) order by case parsename(file_name, 1) when 'xml' then 1 else 2 end)
    from SomeFileTable
) s
where s.RowNum = 1
--ideally you would want to order the results but you don't have much of anything to work with in your data as a reliable sorting order since the dates are implied by the file name

CodePudding user response:

You may want to use a window function that ranks your rows by partitioning on the month and ordering by the format name, by working on the file_name field.

WITH ranked_reports AS ( 
    SELECT
        id,
        file_name,
        ROW_NUMBER() OVER(
            PARTITION BY LEFT(file_name, 3) 
            ORDER BY     RIGHT(file_name, 3) DESC
        ) AS rank
    FROM 
        reports
)
SELECT 
    id, 
    file_name
FROM 
    ranked_reports
WHERE 
    rank = 1 
  • Related