Home > Software design >  Incorrect results when using last_value()
Incorrect results when using last_value()

Time:09-04

This is the table:

id region variety price
1 Alexander Valley Cabernet Sauvignon 35
2 Alexander Valley Cabernet Sauvignon 45
3 Alexander Valley Merlot 19
4 California Sauvignon Blanc 8
5 California Pinot Noir 17

I wanted to find out for each region, the cheapest and most expensive variety, so the output should be:

region expensive cheap
Alexander Valley Cabernet Sauvignon Merlot
California Pinot Noir Sauvignon Blanc

I was able to get the correct result using two first_value()

SELECT
  DISTINCT region,
  FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive,
  FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price) AS cheapest
FROM wine_list

I thought it would be equivalent to the following query

SELECT
  DISTINCT region,
  FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive,
  LAST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS cheapest
FROM wine_list

However now my output is:

region expensive cheap
Alexander Valley Cabernet Sauvignon Cabernet Sauvignon
Alexander Valley Cabernet Sauvignon Merlot
California Pinot Noir Pinot Noir
California Pinot Noir Sauvignon Blanc

Why is my output wrong? I am so baffled.

CodePudding user response:

The default window for FIRST_VALUE and LAST_VALUE is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. I.e. it is the first resp. last value "so far".

You want it for the whole dataset, however, so you must explicitely describe the window range:

SELECT DISTINCT
  region,
  FIRST_VALUE(variety) OVER 
    (PARTITION BY region ORDER BY price DESC
     ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS expensive,
  LAST_VALUE(variety) OVER 
     (PARTITION BY region ORDER BY price DESC
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cheapest
FROM wine_list;  

CodePudding user response:

This one will work

with wine_cte (region, cheapest, mostexpensive)
as
(
Select Region,
 first_value(variety) Over (Partition By region order by min(price)) as cheapest,
 first_value(variety) Over (Partition By region order by min(price) desc) as mostexpensive,
From wine_list
group By region, variety, price
)
select distinct region, mostexpensive, cheapest from wine_cte
  • Related