I’d like to build a query that returns the top 1 million customers based off of two criteria that ranks 10 million customers.
Criterion 1 being a grade assigned to each customer from 1 to 5, 1 being the best
Criterion 2 being a grade assigned to each customer from A to E, A being the best
Criterion 1 outweighs Criterion 2, in that before you move to B (Criterion 2), you must first go from 1 to 5 (Criterion 1) within the A band (Criterion 2) i.e. A customer that scores a 5 (criterion 1) and an A (criterion 2), is a better customer than a customer that scores a 1 (criterion 1) and a B (criterion 2).
I’d like the query to return the top 1 million customers, stopping within the bands that return the 1 million-th customer e.g. if customer 1 million is in the 4C band, don’t return any customers beyond 4C. It’s ok if it's just over 1 million, to accommodate every customer in 4C band. This is my attempt at it but this doesn’t account for sequence:
SELECT *
FROM CUSTOMER_POPULATION
WHERE Criterion1 IN (5,4,3,2,1)
AND Criterion2 ('A','B','C','D','E')
LIMIT 1000000
TIA.
CodePudding user response:
WITH CUSTOMER_POPULATION (NAME, CRITERION1, CRITERION2) AS
(SELECT * FROM VALUES
('Alice',3,'A'),('Bob',4,'C'),('Carol',5,'E'),('Dave',2,'C')
,('Esther',2,'E'),('Fred',5,'C'),('Gladys',3,'E'),('Harvey',2,'E')
,('Iona',3,'C'),('John',1,'A'),('Kate',4,'E'),('Leo',3,'B')
,('Mary',2,'C'),('Nora',3,'A'),('Oscar',1,'D'),('Penny',3,'C')
,('Quincy',3,'A'),('Ruth',5,'E'),('Sam',4,'B'),('Tina',2,'C')
,('Ulrich',1,'B'),('Velma',5,'B'),('Wayne',2,'C'),('Xena',5,'B')
,('Yale',1,'D'),('Zoe',5,'C')
)
SELECT *
FROM CUSTOMER_POPULATION
WHERE Criterion1 IN (5,4,3,2,1)
AND Criterion2 IN ('A','B','C','D','E')
ORDER BY CONCAT(Criterion2, Criterion1)
LIMIT 1000000
NAME | CRITERION1 | CRITERION2 |
---|---|---|
John | 1 | A |
Alice | 3 | A |
Nora | 3 | A |
Quincy | 3 | A |
Ulrich | 1 | B |
Leo | 3 | B |
Sam | 4 | B |
Xena | 5 | B |
Velma | 5 | B |
Dave | 2 | C |
Wayne | 2 | C |
Tina | 2 | C |
Mary | 2 | C |
Iona | 3 | C |
Penny | 3 | C |
Bob | 4 | C |
Fred | 5 | C |
Zoe | 5 | C |
Oscar | 1 | D |
Yale | 1 | D |
Harvey | 2 | E |
Esther | 2 | E |
Gladys | 3 | E |
Kate | 4 | E |
Ruth | 5 | E |
Carol | 5 | E |
CodePudding user response:
rank()
will let you number the bands according their ordering by the two criteria. Because ties get the same ranking, you won't cut off the results in the middle of a band at exactly the one millionth row.
with data as (
select *, rank() over (order by Criterion2, Criterion1) as rnk
from CUSTOMER_POPULATION
where Criterion1 IN (5,4,3,2,1) and Criterion2 in ('A','B','C','D','E')
)
select * from data where rnk <= 1000000;