I've spent an inordinate amount of time this morning trying to Google what I thought would be a simple thing. I need to set up an SQL query that selects multiple columns, but only returns one instance if one of the columns (let's call it case_number) returns duplicate rows.
select case_number, name, date_entered from ticket order by date_entered
There are rows in the ticket table that have duplicate case_number, so I want to eliminate those duplicate rows from the results and only show one instance of them. If I use "select distinct case_number, name, date_entered" it applies the distinct operator to all three fields, instead of just the case_number field. I need that logic to apply to only the case_number field and not all three. If I use "group by case_number having count (*)>1" then it returns only the duplicates, which I don't want.
Any ideas on what to do here are appreciated, thank you so much!
CodePudding user response:
You can use ROW_NUMBER()
. For example
select *
from (
select *,
row_number() over(partition by case_number) as rn
) x
where rn = 1
The query above will pseudo-randomly pick one row for each case_number
. If you want a better selection criteria you can add ORDER BY
or window frames to the OVER
clause.
CodePudding user response:
You can GROUP BY the common criteria and just apply MIN() or MAX() on the other attributes if any value for "name" inside the group is acceptable for you:
SELECT
case_number,
MIN(name) AS name,
MIN(date_entered) AS created,
MAX(date_entered) AS last_edit
FROM ticket
GROUP BY case_number
ORDER BY date_entered
This should work in any RDBMS.
Also, it just doesn't pick a random value, but gives you control over it. If you are dealing with multiple "entries" inside a ticket, MIN(date_entered)
definitely gives you the timestamp when the case_number
first appeared. MAX() would give you the date of the last entry...
Please note:
The duplicated case_number
entries in your ticket table indicate that the database is not properly normalized. Queries like this should not be necessary in a normalized schema. If you are developing a new application, you should consider a redesign.