Home > Mobile >  SQL Distinct / GroupBy
SQL Distinct / GroupBy

Time:06-08

Ok, I’m stuck on an SQL query and tried long enough that it’s time to ask for help :) I'm using Objection.js – but that's not super relevant as I really just can't figure out how to structure the SQL.

I have the following example data set:

Items

id name
1 Test 1
2 Test 2
3 Test 3

Listings

id item_id price created_at
1 1 100 1654640000
2 1 60 1654640001
3 1 80 1654640002
4 2 90 1654640003
5 2 90 1654640004
6 3 50 1654640005

What I’m trying to do:

  • Return the lowest priced listing for each item
  • If all listings for an item have the same price, I want to return the newest of the two items
  • Overall, I want to return the resulting items by price

I’m trying to write a query that returns the data:

id item_id name price created_at
6 3 Test 3 50 1654640005
2 1 Test 1 60 1654640001
5 2 Test 2 90 1654640004

Any help would be greatly appreciated! I'm also starting fresh, so I can add new columns to the data if that would help at all :)


An example of where my query is right now:

select * from "listings" inner join (select "item_id", MIN(price) as "min_price" from "listings" group by "item_id") as "grouped_listings" on "listings"."item_id" = "grouped_listings"."item_id" and "listings"."price" = "grouped_listings"."min_price" where "listings"."sold_at" is null and "listings"."expires_at" > ? order by CAST(price AS DECIMAL) ASC limit ?;

This gets me listings – but if two listings have the same price, it returns multiple listings with the same item_id – not ideal.

CodePudding user response:

Aggregation functions, as the MIN function you employed in your query, is a viable option, yet if you want to have an efficient query for your problem, window functions can be your best friends. This class of functions allow to compute values over "windows" (partitions) of your table given some specified columns.

For the solution to this problem I'm going to compute two values using the window functions:

  • the minimum value for "listings.price", by partitioning on "listings.item_id",
  • the maximum value for "created_at", by partitioning on "listings.item_id" and listings.price
SELECT *,
       MIN(price)      OVER(PARTITION BY item_id)        AS min_price,
       MAX(created_at) OVER(PARTITION BY item_id, price) AS max_created_at
FROM listings

Once you have all records of listings associated to the corresponding minimum price and latest date, it's necessary for you to select the records whose

  • price equals the minimum price
  • created_at equals the most recent created_at
WITH cte AS (
    SELECT *,
           MIN(price)      OVER(PARTITION BY item_id)        AS min_price,
           MAX(created_at) OVER(PARTITION BY item_id, price) AS max_created_at
    FROM listings
)
SELECT id, 
       item_id, 
       price, 
       created_at
FROM cte
WHERE price = min_price
  AND created_at = max_created_at

If you need to order by price, it's sufficient to add a ORDER BY price clause.

Check the demo here.

CodePudding user response:

Given the postgresql tag, this should work:

with listings_numbered as (
  select *, row_number() over (
    partition by item_id
    order by price asc, created_at desc
  ) as rownum
  from listings
)
select l.id, l.item_id, i.name, l.price, l.created_at
from listings_numbered l
join items i on l.item_id=i.id
where l.rownum=1
order by price asc;

This is a bit of an advanced query, using window functions and a common table expression, but we can break it down.

with listings_numbered as (...) select simply means to run the query inside of the ..., and then we can refer to the results of that query as listings_numbered inside of the select, as though it was a table.

We're selecting all of the columns in listings, plus one more:
row_number() over (partition by item_id order by price asc, created_at desc). partition by item_id means that we would like the row number to reset for each new item_id, and the order by specifies the ordering that the rows should get within each partition before we number them: first increasing by price, then decreasing by creation time to break ties.

The result of the CTE listings_numbered looks like:

id item_id price created_at rownum
2 1 60 1654640001 1
3 1 80 1654640002 2
1 1 100 1654640000 3
5 2 90 1654640004 1
4 2 90 1654640003 2
6 3 50 1654640005 1

If you look at only the rows where rownum (the last column) is 1, then you can see that it's exactly the set of listings that you're interested in.

The outer query then selects from this this dataset, joins on items to get the name, filters to only the listings where rownum is 1, and sorts by price, to get the final result:

id item_id name price created_at
6 3 Test 3 50 1654640005
2 1 Test 1 60 1654640001
5 2 Test 2 90 1654640004
  • Related