SQL Query for multiple columns with one column distinct-CodePudding

I've spent an inordinate amount of time this morning trying to Google what I thought would be a simple thing. I need to set up an SQL query that selects multiple columns, but only returns one instance if one of the columns (let's call it case_number) returns duplicate rows.

select case_number, name, date_entered from ticket order by date_entered

There are rows in the ticket table that have duplicate case_number, so I want to eliminate those duplicate rows from the results and only show one instance of them. If I use "select distinct case_number, name, date_entered" it applies the distinct operator to all three fields, instead of just the case_number field. I need that logic to apply to only the case_number field and not all three. If I use "group by case_number having count (*)>1" then it returns only the duplicates, which I don't want.

Any ideas on what to do here are appreciated, thank you so much!

CodePudding user response：

You can use ROW_NUMBER(). For example

select *
from (
  select *,
    row_number() over(partition by case_number) as rn
) x
where rn = 1

The query above will pseudo-randomly pick one row for each case_number. If you want a better selection criteria you can add ORDER BY or window frames to the OVER clause.

CodePudding user response：

You can GROUP BY the common criteria and just apply MIN() or MAX() on the other attributes if any value for "name" inside the group is acceptable for you:

SELECT 
    case_number, 
    MIN(name) AS name, 
    MIN(date_entered) AS created, 
    MAX(date_entered) AS last_edit
FROM ticket 
GROUP BY case_number
ORDER BY date_entered

This should work in any RDBMS.

Also, it just doesn't pick a random value, but gives you control over it. If you are dealing with multiple "entries" inside a ticket, MIN(date_entered) definitely gives you the timestamp when the case_number first appeared. MAX() would give you the date of the last entry...

Please note:
The duplicated case_number entries in your ticket table indicate that the database is not properly normalized. Queries like this should not be necessary in a normalized schema. If you are developing a new application, you should consider a redesign.