How to SELECT DISTINCT() and then keep rows with most frequent value in another column?-CodePudding

Sample table columns:

The first step is to identify unique values of X:

SELECT distinct(X) FROM nodes ORDER BY X

Result:

  2
  5
255

Note: Order by isn't necessary, it's there to make the result easier to read.

For each of these values in column x, I would like to find the most frequent value in column y (only for the rows sharing the same X value, not the entire table) and output that single row.

I can do that by hand very easily:

  SELECT x, y, COUNT(y) AS 'n' 
    FROM nodes 
   WHERE x = 2           /* Manually edit this number to get desired result */
GROUP BY y 
ORDER BY COUNT(y) DESC 
   LIMIT 1

Result:

when x = 2        2, 5, 2
when x = 5        5, 0, 2
when x = 255    255, 0, 5

Note: I don't need to output COUNT(y) AS 'n'. It's included here to make it easier to see what's going on.

I can do this in Python by iterating through the unique x values from the first query and plugging that number into the second query each loop. I can also just use a query like this and do all the work in Python:

SELECT x, y FROM nodes ORDER BY x

I would prefer to do it all in one query with SQL.

Is there a way to express this idea of "for each x in query #1, run query #1 with x in the WHERE clause" to deliver a single result out of one query operation?

CodePudding user response：

Use FIRST_VALUE() window function:

SELECT DISTINCT x, 
       FIRST_VALUE(y) OVER (PARTITION BY x ORDER BY COUNT(*) DESC) y
FROM tablename
GROUP BY x, y;

See the demo.