Home > Software design >  How to SELECT DISTINCT() and then keep rows with most frequent value in another column?
How to SELECT DISTINCT() and then keep rows with most frequent value in another column?

Time:08-09

Sample table columns:

  x    y
255    0
255    0
  5   10
  5    0
  5    1
  5    0
255    0
255    0
  2    5
  2    2
  2    5
255    0

The first step is to identify unique values of X:

SELECT distinct(X) FROM nodes ORDER BY X

Result:

  2
  5
255

Note: Order by isn't necessary, it's there to make the result easier to read.

For each of these values in column x, I would like to find the most frequent value in column y (only for the rows sharing the same X value, not the entire table) and output that single row.

I can do that by hand very easily:

  SELECT x, y, COUNT(y) AS 'n' 
    FROM nodes 
   WHERE x = 2           /* Manually edit this number to get desired result */
GROUP BY y 
ORDER BY COUNT(y) DESC 
   LIMIT 1

Result:

when x = 2        2, 5, 2
when x = 5        5, 0, 2
when x = 255    255, 0, 5

Note: I don't need to output COUNT(y) AS 'n'. It's included here to make it easier to see what's going on.

I can do this in Python by iterating through the unique x values from the first query and plugging that number into the second query each loop. I can also just use a query like this and do all the work in Python:

SELECT x, y FROM nodes ORDER BY x

I would prefer to do it all in one query with SQL.

Is there a way to express this idea of "for each x in query #1, run query #1 with x in the WHERE clause" to deliver a single result out of one query operation?

CodePudding user response:

Use FIRST_VALUE() window function:

SELECT DISTINCT x, 
       FIRST_VALUE(y) OVER (PARTITION BY x ORDER BY COUNT(*) DESC) y
FROM tablename
GROUP BY x, y;

See the demo.

  • Related