I am using Redshift SQL and would like to sample by users-id but I am not sure how to specify that.
Let's say my table looks like this
user_id | date | other columns
1 | 2020-01-01 | ...
1 | 2020-02-01 | ...
2 | 2020-02-11 | ...
...
How do I filter for 10,000 random user-id & day pairs within 2000-01-01 AND 2020-01-01. How do I do this in SQL?
CodePudding user response:
We can use ROW_NUMBER()
with a random ordering:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY RANDOM()) rn
FROM yourTable
WHERE date BETWEEN '2000-01-01' AND '2020-01-01'
)
SELECT *
FROM cte
WHERE rn <= 10000;
CodePudding user response:
You can use ORDER BY and LIMIT (or TOP).
select *
from <table>
order by random()
limit 10000