Home > Back-end >  SQL Query for multiple columns with one column distinct
SQL Query for multiple columns with one column distinct

Time:10-16

I've spent an inordinate amount of time this morning trying to Google what I thought would be a simple thing. I need to set up an SQL query that selects multiple columns, but only returns one instance if one of the columns (let's call it case_number) returns duplicate rows.

select case_number, name, date_entered from ticket order by date_entered

There are rows in the ticket table that have duplicate case_number, so I want to eliminate those duplicate rows from the results and only show one instance of them. If I use "select distinct case_number, name, date_entered" it applies the distinct operator to all three fields, instead of just the case_number field. I need that logic to apply to only the case_number field and not all three. If I use "group by case_number having count (*)>1" then it returns only the duplicates, which I don't want.

Any ideas on what to do here are appreciated, thank you so much!

CodePudding user response:

You can use ROW_NUMBER(). For example

select *
from (
  select *,
    row_number() over(partition by case_number) as rn
) x
where rn = 1

The query above will pseudo-randomly pick one row for each case_number. If you want a better selection criteria you can add ORDER BY or window frames to the OVER clause.

CodePudding user response:

You can GROUP BY the common criteria and just apply MIN() or MAX() on the other attributes if any value for "name" inside the group is acceptable for you:

SELECT 
    case_number, 
    MIN(name) AS name, 
    MIN(date_entered) AS created, 
    MAX(date_entered) AS last_edit
FROM ticket 
GROUP BY case_number
ORDER BY date_entered

This should work in any RDBMS.

Also, it just doesn't pick a random value, but gives you control over it. If you are dealing with multiple "entries" inside a ticket, MIN(date_entered) definitely gives you the timestamp when the case_number first appeared. MAX() would give you the date of the last entry...

Please note:
The duplicated case_number entries in your ticket table indicate that the database is not properly normalized. Queries like this should not be necessary in a normalized schema. If you are developing a new application, you should consider a redesign.

  • Related