I have scraped some data off Amazon for a client. I need to use that data to create an 'Above the fold' report which basically means that I have show the top 3 results on the page for each of the keywords that I've scraped.
Now, on Amazon, the search results are a mix of Sponsored results (which are paid) and organic results (which are free). The sponsored results are listed first, and the organic results after that.
The rule that I need to set in place is that for each keyword and each date, get the top 3 sponsored results. If there are no sponsored results, get the top three organic results
. However, the added layer of complication is; if there are 2 sponsored results, I need to get those two and then get the first organic result.
A simplified view of my table looks like this:
keyword report_date rank_no result_type
test1 2022-10-10 1 Sponsored
test1 2022-10-10 2 Sponsored
test1 2022-10-10 3 Sponsored
test1 2022-10-10 4 Sponsored
test1 2022-10-10 5 Sponsored
test1 2022-10-10 1 Organic
test1 2022-10-10 2 Organic
test1 2022-10-10 3 Organic
test2 2022-10-10 1 Organic
test2 2022-10-10 2 Organic
test2 2022-10-10 3 Organic
test2 2022-10-10 4 Organic
test3 2022-10-10 1 Sponsored
test3 2022-10-10 2 Sponsored
test3 2022-10-10 1 Organic
test3 2022-10-10 2 Organic
test3 2022-10-10 3 Organic
Based on rank_no:
- for test1, the results should be the top 3 sponsored results
- for test2, the results should be the top 3 organic products
- for test3, the result should be the top 2 sponsored products and the top organic product.
I have been googling to find a solution to this problem but I haven't found anything yet. If you can help me with a solution OR a suggestion on what to google to find a solution to this problem, I'd be very grateful.
Thank you for your help in advance!
CodePudding user response:
ROW_NUMBER()
can be made to work here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY keyword
ORDER BY result_type DESC, rank_no) rn
FROM yourTable
)
SELECT keyword, report_date, rank_no, result_type
FROM cte
WHERE rn <= 3
ORDER BY keyword, result_type DESC, rank_no;
The ordering used in ROW_NUMBER()
above places Sponsored
records ahead of Organic
. In the case of multiple records of the same keyword type, the ordering uses the lower rank.