PostgreSQL - Find the most expensive and cheapest wine per region-CodePudding

Most Expensive And Cheapest Wine Find the cheapest and the most expensive variety in each region. Output the region along with the corresponding most expensive and the cheapest variety.

I'm trying to solve this question from Stratascratch, following the hint was given on the platform. Please help review my answer and would love to know the better way to solve this. TIA.

SELECT EX.region_1, EX.expensive_variety, CH.cheap_variety

FROM (SELECT A.region_1, A.expensive_variety

FROM 
    (SELECT distinct region_1, variety AS expensive_variety, price,
        ROW_NUMBER() OVER (PARTITION BY region_1 ORDER BY price desc) as most_expensive
    FROM winemag_p1
    ORDER BY region_1 asc) A
    WHERE A.most_expensive = 1) EX

INNER JOIN

(SELECT B.region_1, B.cheap_variety

FROM 
(SELECT distinct region_1, variety as cheap_variety, price,
    ROW_NUMBER() OVER (PARTITION BY region_1 ORDER BY price ASC) as cheapest
FROM winemag_p1
ORDER BY region_1 asc) B
WHERE B.cheapest = 1) CH

ON EX.region_1 = CH.region_1

CodePudding user response：

Something like this, the MIN and MAX per region:

SELECT region
, MIN(price) AS cheapest
, MAX(price) AS most_expensive
FROM table_name
GROUP BY region;

CodePudding user response：

You can find both in the same sub-query.

SELECT 
B.region_1, 
MAX(CASE WHEN cheapest = 1 then variety else '' end) cheapest_variety,
MAX(CASE WHEN cheapest = 1 then price else 0 end) cheapest_price,
MAX(CASE WHEN expensive = 1 then variety else '' end) expensive_variety,
MAX(CASE WHEN expensive = 1 then price else 0 end) expensive_price
FROM 
(SELECT distinct region_1, variety as cheap_variety, price,
    ROW_NUMBER() OVER (PARTITION BY region_1 ORDER BY price ASC) as cheapest,
    ROW_NUMBER() OVER (PARTITION BY region_1 ORDER BY price DESC) as expensive
FROM winemag_p1
) B
WHERE cheapest = 1 OR expensive = 1
GROUP BY region_1
ORDER BY region_1;
``

CodePudding user response：

You can use window functions or subqueries to get the highest and lowest prices per region. Then get all rows with these prices and aggregate per region.

For instance:

select
  region_1,
  min(price) as low_price,
  string_agg(variety, ', ') filter (where price = min_price) as low_price_varieties,
  max(price) as high_price,
  string_agg(variety, ', ') filter (where price = max_price) as high_price_varieties
from
(
  select 
    region_1, variety, price,
    min(price) over (partition by region_1) as min_price,
    max(price) over (partition by region_1) as max_price
  from winemag_p1
) with_min_and_max
where price in (min_price, max_price)
group by region_1
order by region_1;

As to your own query: This is an okay query. Here are my remarks:

ORDER BY in a subquery only makes sense, when limiting the rows (with FETCH FIRST ROWS), because a query result is an unordered data set by definition.
Why DISTINCT? There are no duplicates to remove.
You don't handle ties. If there are two top wines with the same price in a region for instance, you pick one arbirarily.