Is there a way to query SQL to get an even distribution of samples. For example if one of my fields is a State field... I want to query top 5000 results with (100 from each state)... Or another example, if I have a field that says whether a client is a new client or an existing client, and I want the top 500 results where 250 are new clients and 250 are existing clients.
I am trying to avoid two different queries that I have to manually combine the results.
CodePudding user response:
You can do this by using ROW_NUMBER
. You partition your data on one or more columns, so the row numbering starts from 1 in every partition. You then select the top x rows and ORDER BY the row number column.
e.g.
WITH cte
AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY StateName ORDER BY NEWID() ) AS RN
FROM dbo.Sales
)
SELECT TOP 5 *
FROM cte
ORDER BY RN;