Home > Software engineering >  BigQuery SQL get top 3 results based on criteria
BigQuery SQL get top 3 results based on criteria

Time:10-11

I have scraped some data off Amazon for a client. I need to use that data to create an 'Above the fold' report which basically means that I have show the top 3 results on the page for each of the keywords that I've scraped.

Now, on Amazon, the search results are a mix of Sponsored results (which are paid) and organic results (which are free). The sponsored results are listed first, and the organic results after that.

The rule that I need to set in place is that for each keyword and each date, get the top 3 sponsored results. If there are no sponsored results, get the top three organic results. However, the added layer of complication is; if there are 2 sponsored results, I need to get those two and then get the first organic result.

A simplified view of my table looks like this:

keyword report_date rank_no result_type
test1   2022-10-10  1   Sponsored
test1   2022-10-10  2   Sponsored
test1   2022-10-10  3   Sponsored
test1   2022-10-10  4   Sponsored
test1   2022-10-10  5   Sponsored
test1   2022-10-10  1   Organic
test1   2022-10-10  2   Organic
test1   2022-10-10  3   Organic
test2   2022-10-10  1   Organic
test2   2022-10-10  2   Organic
test2   2022-10-10  3   Organic
test2   2022-10-10  4   Organic
test3   2022-10-10  1   Sponsored
test3   2022-10-10  2   Sponsored
test3   2022-10-10  1   Organic
test3   2022-10-10  2   Organic
test3   2022-10-10  3   Organic

Based on rank_no:

  • for test1, the results should be the top 3 sponsored results
  • for test2, the results should be the top 3 organic products
  • for test3, the result should be the top 2 sponsored products and the top organic product.

I have been googling to find a solution to this problem but I haven't found anything yet. If you can help me with a solution OR a suggestion on what to google to find a solution to this problem, I'd be very grateful.

Thank you for your help in advance!

CodePudding user response:

ROW_NUMBER() can be made to work here:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY keyword
                                 ORDER BY result_type DESC, rank_no) rn
    FROM yourTable
)

SELECT keyword, report_date, rank_no, result_type
FROM cte
WHERE rn <= 3
ORDER BY keyword, result_type DESC, rank_no;

The ordering used in ROW_NUMBER() above places Sponsored records ahead of Organic. In the case of multiple records of the same keyword type, the ordering uses the lower rank.

  • Related