Home > Back-end >  How to write a SQL statement to return Distinct records based on X, Y, and Z columns and the first v
How to write a SQL statement to return Distinct records based on X, Y, and Z columns and the first v

Time:12-12

I am struggling to wrap my head around this.

I have a data table like this:

TABLE:

Column_A Column_X Column_Y Column_Z
1234 TOON CITY OFF
1235 STAR CARS OFF
1226 STAR CARS OFF
3234 TOON CITY OFF
4435 STAR CARS OFF
5555 STAR CARS OFF
5555 TOON CITY OFF
3333 STAR CARS OFF
1111 STAR CARS OFF
SELECT Distinct
       TABLE.Column_X,
       TABLE.Column_Y,
       TABLE.Column_Z,
  FROM TABLE

Will return me two records

TOON CITY OFF
STAR CARS OFF

But now I want the first instance of Column A in the table to have the results show like:

TOON CITY OFF 1234
STAR CARS OFF 1235

The thing I'm working on doesn't care about which number I put in a form, but it wants one of them. Its actually looking at like 50 distinct values and has 4 that I need a value but just one of them.

I've tried joining the table back to itself based on its distinct and select the TOP 1 record and using table2.Column_A for the results but that doesn't seem to work.

Group By seems like it might be the solution but I can't get that to work either.

SELECT Column_X, Column_Y, Column_Z, Top 1(Column_A)???
  FROM Table
 GROUP BY Column_X, Column_Y, Column_Z

CodePudding user response:

I would prefer here to simply fetch for example the one with the lowest value in Column_A:

SELECT Column_X, Column_Y, Column_Z, MIN(Column_A) AS Column_A
FROM yourtable
GROUP BY Column_X, Column_Y, Column_Z;

That's easy and short. It will produce this outcome based on your sample data:

Column_X Column_Y Column_Z Column_A
STAR CARS OFF 1111
TOON CITY OFF 1234

I think this is better because mostly it doesn't matter which entry appears "first" in a table, but we prefer to select such entries having a lowest or highest value.

Which entry occurs as "first" in a table is usually random and means nothing unless there is an id or sequence which is used as "sorting column".

But anyway, if we really want to exactly choose the "first" two entries, we can use ROW_NUMBER:

WITH sub AS
(SELECT 
  ROW_NUMBER() OVER(PARTITION BY Column_X,  Column_Y, Column_Z 
  ORDER BY Column_X) AS RowNr,
  Column_A, Column_X,Column_Y,Column_Z
FROM yourtable)
SELECT Column_X,
       Column_Y,
       Column_Z,
       Column_A
  FROM sub
WHERE RowNr = 1
ORDER BY Column_A;

That leads to worse readability, but will produce exactly the same result which you have shown in your question:

Column_X Column_Y Column_Z Column_A
TOON CITY OFF 1234
STAR CARS OFF 1235

Try out here: db<>fiddle

CodePudding user response:

You took a guess with

Top 1(Column_A)???

A typical solution is

MIN(Column_a)

Certainly there are others, such as MAX, or using an ORDER BY and then choosing the 1st row.

The key is to use an "aggregate" function with the GROUP BY. Some DB vendors, e.g. mysql, are a bit lax on this point, returning a single arbitrary row's value. But to be SQL92-compiliant, better to use the aggregate. That way you know the specification, the returned result will be completely predictable and reproducible.

  • Related