Home > Software engineering >  Return top 1 million based off of two criteria (SQL Query)
Return top 1 million based off of two criteria (SQL Query)

Time:07-20

I’d like to build a query that returns the top 1 million customers based off of two criteria that ranks 10 million customers.

Criterion 1 being a grade assigned to each customer from 1 to 5, 1 being the best

Criterion 2 being a grade assigned to each customer from A to E, A being the best

Criterion 1 outweighs Criterion 2, in that before you move to B (Criterion 2), you must first go from 1 to 5 (Criterion 1) within the A band (Criterion 2) i.e. A customer that scores a 5 (criterion 1) and an A (criterion 2), is a better customer than a customer that scores a 1 (criterion 1) and a B (criterion 2).

I’d like the query to return the top 1 million customers, stopping within the bands that return the 1 million-th customer e.g. if customer 1 million is in the 4C band, don’t return any customers beyond 4C. It’s ok if it's just over 1 million, to accommodate every customer in 4C band. This is my attempt at it but this doesn’t account for sequence:

SELECT *
FROM CUSTOMER_POPULATION 
WHERE Criterion1 IN (5,4,3,2,1)
AND Criterion2 ('A','B','C','D','E')
LIMIT 1000000

TIA.

CodePudding user response:

WITH CUSTOMER_POPULATION (NAME, CRITERION1, CRITERION2) AS
(SELECT * FROM VALUES
 ('Alice',3,'A'),('Bob',4,'C'),('Carol',5,'E'),('Dave',2,'C')
,('Esther',2,'E'),('Fred',5,'C'),('Gladys',3,'E'),('Harvey',2,'E')
,('Iona',3,'C'),('John',1,'A'),('Kate',4,'E'),('Leo',3,'B')
,('Mary',2,'C'),('Nora',3,'A'),('Oscar',1,'D'),('Penny',3,'C')
,('Quincy',3,'A'),('Ruth',5,'E'),('Sam',4,'B'),('Tina',2,'C')
,('Ulrich',1,'B'),('Velma',5,'B'),('Wayne',2,'C'),('Xena',5,'B')
,('Yale',1,'D'),('Zoe',5,'C')
)
SELECT *
FROM CUSTOMER_POPULATION 
WHERE Criterion1 IN (5,4,3,2,1)
AND Criterion2 IN ('A','B','C','D','E')
ORDER BY CONCAT(Criterion2, Criterion1)
LIMIT 1000000
NAME CRITERION1 CRITERION2
John 1 A
Alice 3 A
Nora 3 A
Quincy 3 A
Ulrich 1 B
Leo 3 B
Sam 4 B
Xena 5 B
Velma 5 B
Dave 2 C
Wayne 2 C
Tina 2 C
Mary 2 C
Iona 3 C
Penny 3 C
Bob 4 C
Fred 5 C
Zoe 5 C
Oscar 1 D
Yale 1 D
Harvey 2 E
Esther 2 E
Gladys 3 E
Kate 4 E
Ruth 5 E
Carol 5 E

CodePudding user response:

rank() will let you number the bands according their ordering by the two criteria. Because ties get the same ranking, you won't cut off the results in the middle of a band at exactly the one millionth row.

with data as (
    select *, rank() over (order by Criterion2, Criterion1) as rnk
    from CUSTOMER_POPULATION
    where Criterion1 IN (5,4,3,2,1) and Criterion2 in ('A','B','C','D','E')
)
select * from data where rnk <= 1000000;
  • Related