Home > Back-end >  excluding dups which are lower than max values in SQL
excluding dups which are lower than max values in SQL

Time:10-24

I have the following simple table (Table1), where each row is a student_ID and their name, and each student has one or multiple wins (Wins). I would like to output: Student_ID, Student_name, count of Wins, sorted by count of Wins (descending) and then Student_ID (ascending), excluding those students who have the same count of Wins which is less than the max of the Wins (i.e.5). In other words, Lizzy and Mark have the same count of wins, and 3 is lower than 5, so the output will exclude the two students, Lizzy and Mark.

From comments: "Betty, David and Cathy should be excluded", also.

Table1:

student_id student_name wins
1 John YES
1 John YES
1 John YES
1 John YES
1 John YES
2 Brandon YES
2 Brandon YES
2 Brandon YES
2 Brandon YES
2 Brandon YES
3 Lizzy YES
3 Lizzy YES
3 Lizzy YES
4 Mark YES
4 Mark YES
4 Mark YES
5 Betty YES
6 David YES
7 Cathy YES
8 Joe YES
8 Joe YES

Desired output:

student_id student_name cnt_wins
1 John 5
2 Brandon 5
8 Joe 2

Here is my SQL in Oracle. I can't figure out what went wrong. The log says "(SELECT b.cnt_wins, count(b.student_id) has too many values".

WITH st_cte AS
(SELECT student_id, student_name, count(wins) cnt_wins
FROM Table1
GROUP BY student_id, student_name
ORDER BY count(wins) DESC, student_id)
SELECT *
FROM st_cte a
WHERE a.cnt_wins not in
(SELECT b.cnt_wins, count(b.student_id)
 FROM st_cte b
 WHERE b.cnt_wins <
 (SELECT max(c.cnt_wins) FROM st_cte c)
 GROUP BY b.cnt_wins
 HAVING count(b.student_id) > 1);

CodePudding user response:

There are too many values selected inside the 'in' select:

WHERE a.cnt_wins -- 1 value
  not in
       (SELECT b.cnt_wins, count(b.student_id) -- 2 values
         FROM st_cte b

you shoud either do :

WHERE a.cnt_wins not in
       (SELECT b.cnt_wins
         FROM st_cte ...

or

 WHERE (a.cnt_wins, count(something)) not in
        (SELECT b.cnt_wins, count(b.student_id)
          FROM st_cte ...

CodePudding user response:

Updated based on updated requirements...

The requirement was ambiguous in that Betty, David, and Cathy seem to also meet the criteria to be removed from the result. This requirement was clarified and those rows should have been removed.

Logic has been added to allow only all max_cnt rows, plus any students with a unique count.

Also note that if wins can be any other non-null value, COUNT(wins) is not correct.

Given all that, maybe something like this is a starting point:

Fiddle

WITH cte AS (
        SELECT student_id, student_name
             , COUNT(wins) cnt_wins
             , MAX(COUNT(wins)) OVER () AS max_cnt
          FROM Table1
         GROUP BY student_id, student_name
     )
   , cte2 AS (
        SELECT cte.*
             , COUNT(*) OVER (PARTITION BY cnt_wins) AS cnt_students
          FROM cte
     )
SELECT student_id, student_name, cnt_wins
  FROM cte2
 WHERE max_cnt = cnt_wins
    OR cnt_students = 1
 ORDER BY cnt_wins DESC, student_id
;

and to handle wins that can be other non-null values:

WITH cte AS (
        SELECT student_id, student_name
             , COUNT(CASE WHEN wins = 'YES' THEN 1 END) cnt_wins
             , MAX(COUNT(CASE WHEN wins = 'YES' THEN 1 END)) OVER () AS max_cnt
          FROM Table1
         GROUP BY student_id, student_name
     )
   , cte2 AS (
        SELECT cte.*
             , COUNT(*) OVER (PARTITION BY cnt_wins) AS cnt_students
          FROM cte
     )
SELECT student_id, student_name, cnt_wins
  FROM cte2
 WHERE max_cnt = cnt_wins
    OR cnt_students = 1
 ORDER BY cnt_wins DESC, student_id
;

Result (with data to test the new requirement, one student (Joe) with unique counts (2)):

STUDENT_ID STUDENT_NAME CNT_WINS
1 John 5
2 Brandon 5
8 Joe 2

Setup:

CREATE TABLE table1 (
    Student_ID   int
  , Student_Name VARCHAR2(20)
  , Wins         VARCHAR2(10)
);


BEGIN
-- Assume only wins are stored.

INSERT INTO table1 VALUES ( 1, 'John', 'YES');
INSERT INTO table1 VALUES ( 1, 'John', 'YES');
INSERT INTO table1 VALUES ( 1, 'John', 'YES');
INSERT INTO table1 VALUES ( 1, 'John', 'YES');
INSERT INTO table1 VALUES ( 1, 'John', 'YES');
INSERT INTO table1 VALUES ( 2, 'Brandon', 'YES');
INSERT INTO table1 VALUES ( 2, 'Brandon', 'YES');
INSERT INTO table1 VALUES ( 2, 'Brandon', 'YES');
INSERT INTO table1 VALUES ( 2, 'Brandon', 'YES');
INSERT INTO table1 VALUES ( 2, 'Brandon', 'YES');

INSERT INTO table1 VALUES ( 3, 'Lizzy', 'YES');
INSERT INTO table1 VALUES ( 3, 'Lizzy', 'YES');
INSERT INTO table1 VALUES ( 3, 'Lizzy', 'YES');

INSERT INTO table1 VALUES ( 4, 'Mark', 'YES');
INSERT INTO table1 VALUES ( 4, 'Mark', 'YES');
INSERT INTO table1 VALUES ( 4, 'Mark', 'YES');

INSERT INTO table1 VALUES ( 5, 'Betty', 'YES');

INSERT INTO table1 VALUES ( 6, 'David', 'YES');
INSERT INTO table1 VALUES ( 7, 'Cathy', 'YES');

INSERT INTO table1 VALUES ( 8, 'Joe', 'YES');
INSERT INTO table1 VALUES ( 8, 'Joe', 'YES');
END;
/

Correction to the original query in the question:

WITH st_cte AS
  (SELECT student_id, student_name, count(wins) cnt_wins
     FROM Table1
    GROUP BY student_id, student_name
    ORDER BY count(wins) DESC, student_id
  )
SELECT *
  FROM st_cte a
 WHERE a.cnt_wins not in
   (SELECT b.cnt_wins
      FROM st_cte b
     WHERE b.cnt_wins < (SELECT max(c.cnt_wins) FROM st_cte c)
     GROUP BY b.cnt_wins
    HAVING count(b.student_id) > 1
   )
;
  • Related