How to write a SQL statement to return Distinct records based on X, Y, and Z columns and the first v-CodePudding

I am struggling to wrap my head around this.

I have a data table like this:

TABLE:

Column_A	Column_X	Column_Y	Column_Z
1234	TOON	CITY	OFF
1235	STAR	CARS	OFF
1226	STAR	CARS	OFF
3234	TOON	CITY	OFF
4435	STAR	CARS	OFF
5555	STAR	CARS	OFF
5555	TOON	CITY	OFF
3333	STAR	CARS	OFF
1111	STAR	CARS	OFF

SELECT Distinct
       TABLE.Column_X,
       TABLE.Column_Y,
       TABLE.Column_Z,
  FROM TABLE

Will return me two records

TOON CITY OFF
STAR CARS OFF

But now I want the first instance of Column A in the table to have the results show like:

TOON CITY OFF 1234
STAR CARS OFF 1235

The thing I'm working on doesn't care about which number I put in a form, but it wants one of them. Its actually looking at like 50 distinct values and has 4 that I need a value but just one of them.

I've tried joining the table back to itself based on its distinct and select the TOP 1 record and using table2.Column_A for the results but that doesn't seem to work.

Group By seems like it might be the solution but I can't get that to work either.

SELECT Column_X, Column_Y, Column_Z, Top 1(Column_A)???
  FROM Table
 GROUP BY Column_X, Column_Y, Column_Z

CodePudding user response：

I would prefer here to simply fetch for example the one with the lowest value in Column_A:

SELECT Column_X, Column_Y, Column_Z, MIN(Column_A) AS Column_A
FROM yourtable
GROUP BY Column_X, Column_Y, Column_Z;

That's easy and short. It will produce this outcome based on your sample data:

Column_X	Column_Y	Column_Z	Column_A
STAR	CARS	OFF	1111
TOON	CITY	OFF	1234

I think this is better because mostly it doesn't matter which entry appears "first" in a table, but we prefer to select such entries having a lowest or highest value.

Which entry occurs as "first" in a table is usually random and means nothing unless there is an id or sequence which is used as "sorting column".

But anyway, if we really want to exactly choose the "first" two entries, we can use ROW_NUMBER:

WITH sub AS
(SELECT 
  ROW_NUMBER() OVER(PARTITION BY Column_X,  Column_Y, Column_Z 
  ORDER BY Column_X) AS RowNr,
  Column_A, Column_X,Column_Y,Column_Z
FROM yourtable)
SELECT Column_X,
       Column_Y,
       Column_Z,
       Column_A
  FROM sub
WHERE RowNr = 1
ORDER BY Column_A;

That leads to worse readability, but will produce exactly the same result which you have shown in your question:

Column_X	Column_Y	Column_Z	Column_A
TOON	CITY	OFF	1234
STAR	CARS	OFF	1235

Try out here: db<>fiddle

CodePudding user response：

You took a guess with

Top 1(Column_A)???

A typical solution is

MIN(Column_a)

Certainly there are others, such as MAX, or using an ORDER BY and then choosing the 1st row.

The key is to use an "aggregate" function with the GROUP BY. Some DB vendors, e.g. mysql, are a bit lax on this point, returning a single arbitrary row's value. But to be SQL92-compiliant, better to use the aggregate. That way you know the specification, the returned result will be completely predictable and reproducible.