This is the table:
id | region | variety | price |
---|---|---|---|
1 | Alexander Valley | Cabernet Sauvignon | 35 |
2 | Alexander Valley | Cabernet Sauvignon | 45 |
3 | Alexander Valley | Merlot | 19 |
4 | California | Sauvignon Blanc | 8 |
5 | California | Pinot Noir | 17 |
I wanted to find out for each region, the cheapest and most expensive variety, so the output should be:
region | expensive | cheap |
---|---|---|
Alexander Valley | Cabernet Sauvignon | Merlot |
California | Pinot Noir | Sauvignon Blanc |
I was able to get the correct result using two first_value()
SELECT
DISTINCT region,
FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive,
FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price) AS cheapest
FROM wine_list
I thought it would be equivalent to the following query
SELECT
DISTINCT region,
FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive,
LAST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS cheapest
FROM wine_list
However now my output is:
region | expensive | cheap |
---|---|---|
Alexander Valley | Cabernet Sauvignon | Cabernet Sauvignon |
Alexander Valley | Cabernet Sauvignon | Merlot |
California | Pinot Noir | Pinot Noir |
California | Pinot Noir | Sauvignon Blanc |
Why is my output wrong? I am so baffled.
CodePudding user response:
The default window for FIRST_VALUE
and LAST_VALUE
is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. I.e. it is the first resp. last value "so far".
You want it for the whole dataset, however, so you must explicitely describe the window range:
SELECT DISTINCT
region,
FIRST_VALUE(variety) OVER
(PARTITION BY region ORDER BY price DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS expensive,
LAST_VALUE(variety) OVER
(PARTITION BY region ORDER BY price DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cheapest
FROM wine_list;
CodePudding user response:
This one will work
with wine_cte (region, cheapest, mostexpensive)
as
(
Select Region,
first_value(variety) Over (Partition By region order by min(price)) as cheapest,
first_value(variety) Over (Partition By region order by min(price) desc) as mostexpensive,
From wine_list
group By region, variety, price
)
select distinct region, mostexpensive, cheapest from wine_cte