Sample table columns:
x y
255 0
255 0
5 10
5 0
5 1
5 0
255 0
255 0
2 5
2 2
2 5
255 0
The first step is to identify unique values of X:
SELECT distinct(X) FROM nodes ORDER BY X
Result:
2
5
255
Note: Order by isn't necessary, it's there to make the result easier to read.
For each of these values in column x, I would like to find the most frequent value in column y (only for the rows sharing the same X value, not the entire table) and output that single row.
I can do that by hand very easily:
SELECT x, y, COUNT(y) AS 'n'
FROM nodes
WHERE x = 2 /* Manually edit this number to get desired result */
GROUP BY y
ORDER BY COUNT(y) DESC
LIMIT 1
Result:
when x = 2 2, 5, 2
when x = 5 5, 0, 2
when x = 255 255, 0, 5
Note: I don't need to output COUNT(y) AS 'n'
. It's included here to make it easier to see what's going on.
I can do this in Python by iterating through the unique x values from the first query and plugging that number into the second query each loop. I can also just use a query like this and do all the work in Python:
SELECT x, y FROM nodes ORDER BY x
I would prefer to do it all in one query with SQL.
Is there a way to express this idea of "for each x in query #1, run query #1 with x in the WHERE clause" to deliver a single result out of one query operation?
CodePudding user response:
Use FIRST_VALUE()
window function:
SELECT DISTINCT x,
FIRST_VALUE(y) OVER (PARTITION BY x ORDER BY COUNT(*) DESC) y
FROM tablename
GROUP BY x, y;
See the demo.