Home > Enterprise >  Limiting output of rows based on count of values in another table?
Limiting output of rows based on count of values in another table?

Time:07-02

As a base example, I have a query that effectively produces a table with a list of values (ID numbers), each of which is attached to a specific category. As a simplified example, it would produce something like this (but at a much larger scale):

IDS Categories
12345 type 1
12456 type 6
77689 type 3
32456 type 4
67431 type 2
13356 type 2
..... .....

Using this table, I want to populate another table that gives me a list of ID numbers, with a limit placed on how many of each category are in that list, cross referenced against a sort of range based chart. For instance, if there are 5-15 IDS of type 1 in my first table, I want the new table with the column of IDS to have 3 type 1 IDS in it, if there are 15-30 type 1 IDS in the first table, I want to have 6 type 1 IDS in the new table.

This sort of range based limit would apply to each category, and the IDS would all populate the same column in the new table. The order, or specific IDS that end up in the final table don't matter, as long as the correct number of IDS end up as a part of that final list of ID numbers. This is being used to provide a semi-random sampling of ID numbers based on categories for a sort of QA related process.

If parts of this are unclear I can do my best to explain more. My initial thought was using a variable for a limit clause, but that isnt possible. I have been trying to sort out how to do this with a case statement but I'm really just not making any headway there but I feel like I am at this sort of paper thin wall I just can't break through.

CodePudding user response:

You can use two window functions:

  • COUNT to keep track of the amount of ids for each category
  • ROW_NUMBER to uniquely identify each id within each category

Once you have collected these information, it's sufficient to keep all those rows that satisfy either of the following conditions:

  • count of rows less or equal to 30 >> ranking less or equal to 6
  • count of rows less or equal to 15 >> ranking less or equal to 3
WITH cte AS (
    SELECT IDS,
           Categories,
           ROW_NUMBER() OVER(ORDER BY IDS PARTITION BY Categories) AS rn
           COUNT(IDS) OVER(PARTITION BY Categories)                AS cnt
    FROM tab
)
SELECT * 
FROM cte
WHERE (rn <= 3 AND cnt <= 15)
   OR (rn <= 6 AND cnt <= 30)

Note: If you have concerns regarding a specific ordering, you need to fix the ORDER BY clause inside the ROW_NUMBER window function.

  •  Tags:  
  • sql
  • Related