Can somebody explain to me how this query below works? It's a query to calculate a median Latitude from the table. I get tired of understanding this and still, I can't.
SELECT *
FROM station as st
where (
select count(LAT_N)
from station
where LAT_N < st.LAT_N
) = (
select count(LAT_N)
from STATION where LAT_N > st.LAT_N
);
CodePudding user response:
The median is the middle value in a collection, which means there are as many values above it as below.
So for each row in the table, the first subquery counts the number of rows where LAT_N
is lower than in the current row, and the second counts the number of rows where it's higher. Then it only returns the rows where these counts are the same.
Note that this won't work in many situations. The simplest example is when there are an even number of distinct values:
1
2
3
4
The median should be 2.5
(the mean of the two middle values), which doesn't exist in the table.
Another case is when there are duplicate values:
1
1
2
3
the median should be 2
. But the count of lower values is 2
, while the count of higher values is 1
, so they won't be equal.
CodePudding user response:
Oh that's some clever code! It's got a bug, but here's what it's trying to do.
We define median as the value(s) that have the same number of values greater and less than them. So here's the query in pseudocode:
for each station in st:
compute number of stations with latitude greater than the current station's latitude
compute number of stations with latitude less than the current station's latitude
If these two values are equal, include it in the result.
Bug:
For tables with an even number of distinct values, the median should be defined as the mean of the two middle values. This code doesn't handle that.