Home > database >  SELECT random 10% of rows for each category on SQL Server
SELECT random 10% of rows for each category on SQL Server

Time:01-23

There is a table of products sold.

row_id customer product date_sold
1 customer_1 thingamajig 01.01.2023
2 customer_12 whosi-whatsi 03.01.2023
3 customer_1 watchamacallit 04.01.2023
4 customer_4 whosi-whatsi 06.01.2023
... ... ... ...

There is always one row per one item.

Let's say customer_1 ordered 100 items total. customer_2 ordered 50 items total. customer_3 ordered 17 items total. How do you select random 10% of rows for each customer? The fraction of rows selected should be rounded up (for example 12 rows total results in 2 selected). That means every customer that bought at least one item should appear in the resulting table. In this case the resulting table for customer_1, customer_2 and customer_3 would have 10 5 2 = 17 rows.

My initial approach would be to create a temp table, calculate desired row counts for each customer and then loop through the temp table and select rows for each customer. Then insert them to another table and select from that one:

drop table if exists #row_counts

select
    customer
    ceiling(convert(decimal(10, 2), count(product)) / 10) as row_count
into #row_counts
from products_sold
group by customer

-- then use cursor to loop over #row_counts and insert into the final table
-- for randomness an 'order by newid()' will be used

But this just doesn't feel like the right solution...

CodePudding user response:

You need to know total count and a row count of what you want. Something like this can perhaps be of service:

EDITED due to it not being randomized properly:


select *
from ( 
   select row_number() over(partition by customerid order by  newid()) as sortOrder
   , COUNT(*) OVER(PARTITION BY customerID) AS cnt
   , *
   FROM products
 ) p
-- Now, we want 10% of total count rounded upwards
WHERE sortOrder <= CEILING(cnt * 0.1)
  • Related