There is a table of products sold.
row_id | customer | product | date_sold |
---|---|---|---|
1 | customer_1 | thingamajig | 01.01.2023 |
2 | customer_12 | whosi-whatsi | 03.01.2023 |
3 | customer_1 | watchamacallit | 04.01.2023 |
4 | customer_4 | whosi-whatsi | 06.01.2023 |
... | ... | ... | ... |
There is always one row per one item.
Let's say customer_1
ordered 100 items total. customer_2
ordered 50 items total. customer_3
ordered 17 items total. How do you select random 10% of rows for each customer? The fraction of rows selected should be rounded up (for example 12 rows total results in 2 selected). That means every customer that bought at least one item should appear in the resulting table. In this case the resulting table for customer_1
, customer_2
and customer_3
would have 10 5 2 = 17 rows.
My initial approach would be to create a temp table, calculate desired row counts for each customer and then loop through the temp table and select rows for each customer. Then insert them to another table and select from that one:
drop table if exists #row_counts
select
customer
ceiling(convert(decimal(10, 2), count(product)) / 10) as row_count
into #row_counts
from products_sold
group by customer
-- then use cursor to loop over #row_counts and insert into the final table
-- for randomness an 'order by newid()' will be used
But this just doesn't feel like the right solution...
CodePudding user response:
You need to know total count and a row count of what you want. Something like this can perhaps be of service:
EDITED due to it not being randomized properly:
select *
from (
select row_number() over(partition by customerid order by newid()) as sortOrder
, COUNT(*) OVER(PARTITION BY customerID) AS cnt
, *
FROM products
) p
-- Now, we want 10% of total count rounded upwards
WHERE sortOrder <= CEILING(cnt * 0.1)